Systems and methods to predict load data values

US10761844B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10761844-B2
Application numberUS-201816023407-A
CountryUS
Kind codeB2
Filing dateJun 29, 2018
Priority dateJun 29, 2018
Publication dateSep 1, 2020
Grant dateSep 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage to set the state of the first load instruction in the training table to the first state when the count reaches a first threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: fetch and decode circuitry to fetch and decode load instructions; a training table to store, for each of a plurality of load instructions, an address, predicted data, a state, and a count of instances of unchanged return data; and tracking circuitry to determine, during one or more of allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during an execute stage, the tracking circuitry further to update the training table during or after a write back stage to: when no match exists, add a new entry reflecting the first load instruction, when a match exists, but has different predicted data than data returned for the first load instruction, reset the count and set the state to a second state, and when a match exists with matching predicted data, increment the count and, when the incremented count reaches a first threshold, set the state to the first state. 2. The processor of claim 1 , wherein, when the predicted data is used to optimize execution during the execute stage, the processor is further to await receipt of actual load data for the first load instruction, confirm whether the actual load data matches the predicted data, when a match is confirmed, accept results of executing the first load instruction and cause the first load instruction to be committed during a commit stage, and, otherwise, discard the optimized execution results and cause the first load instruction to be executed again. 3. The processor of claim 1 , wherein each training table entry is further to store an optimization opportunity expiration timeout count, and the processor, when adding a new entry to the training table, is further to set the optimization opportunity expiration timeout count to a fixed number of clocks ahead of a current clock, the processor further to compare the optimization opportunity expiration timeout to the current clock when determining which of one or more entries to evict from the training table. 4. The processor of claim 1 , wherein the processor, when setting the state to the first state, uses a move elimination operation by storing the predicted data from the training table entry to a SLT register in a register file, and using a pointer to the SLT register in a register table, the contents of the SLT register to be used as load data for subsequent instances of the first load instruction. 5. The processor of claim 1 , wherein adding the new entry reflecting the first load instruction comprises setting the address stored in the new entry to a linear address of the first load instruction, setting the predicated data stored in the new entry to the data returned for the first load instruction, setting the state to the second state, and resetting the count of instances with unchanged data. 6. The processor of claim 1 , wherein the training table is stored in memory being distinct from a register file. 7. The processor of claim 1 , wherein the training table comprises one of a set-associative memory structure, a fully associative memory structure, and a direct-mapped memory structure. 8. The processor of claim 1 , wherein the processor is further to evict a mispredicted entry from the training table, the mispredicted entry being one whose address matches that of the first load instruction, but whose predicted data differs from data returned for the first load instruction. 9. The processor of claim 8 , wherein the processor is further to add each mispredicted load to a Bloom filter, and to use the Bloom filter when selecting a training table entry to evict by determining whether a load-to-evict is either possibly in the set or definitely not in the set. 10. The processor of claim 8 , wherein the processor is to implement a lazy eviction scheme by storing, for each of the mispredicted load instructions, a mismatch count to track how many times the mispredicted load has been mispredicted, and to hold off evicting the mispredicted load until its mismatch count surpasses a second threshold. 11. A method comprising: storing a first load instruction in a training table comprising, for each entry, an address, predicted data, a state, and a count of instances of unchanged return data; determining, during one or more of allocate and decode stages, whether a training table entry having a stored address matching that of a fetched first load instruction exists and has a first state, and, if so, using the data predicted by the entry to optimize execution; and updating the training table during or after a write back stage by: when no match exists, adding a new entry reflecting the first load instruction, when a match exists, but has different predicted data than data returned for the first load instruction, resetting the count and setting the state to a second state, and when a match exists with matching predicted data, incrementing the count and, when the incremented count reaches a first threshold, setting the state to the first state. 12. The method of claim 11 , wherein, when the predicted data is used to optimize execution during the execute stage, the processor is further to await receipt of actual load data for the first load instruction, confirm whether the actual load data matches the predicted data, when a match is confirmed, accept results of executing the first load instruction and cause the first load instruction to be committed during a commit stage, and, otherwise, discard the optimized execution results and cause the first load instruction to be executed again. 13. The method of claim 11 , wherein each training table entry is further to store an optimization opportunity expiration timeout count, and the processor, when adding a new entry to the training table, is further to set the optimization opportunity expiration timeout count to a fixed number of clocks ahead of a current clock, the processor further to compare the optimization opportunity expiration timeout to the current clock when determining which of one or more entries to evict from the training table. 14. The method of claim 11 , wherein the processor, when setting the state to the first state, uses a move elimination operation by storing the predicted data from the training table entry to a SLT register in a register file, and using a pointer to the SLT register as load data for subsequent instances of the first load instruction. 15. The method of claim 11 , wherein the processor, when setting the state to the first state, uses a move elimination operation by storing the predicted data from the training table entry to a SLT register in a register file, and using a pointer to the SLT register in a register table, the contents of the SLT register to be used as load data for subsequent instances of the first load instruction. 16. The method of claim 11 , wherein the training table is stored in memory being distinct from a register file. 17. The method of claim 11 , wherein the training table comprises one of a set-associative memory structure, a fully associative memory structure, and a direct-mapped memory structure. 18. The method of claim 11 , wherein the processor is further to evict a mispredicted entry from the training table, the mispredicted entry being one whose address matches that of the first load instruction, but whose predicted data differs from data returned for the first load instruction. 19. The method of claim 18 , wherein the processor is further to add each mispredicted load to a Bloom filter, and to

Assignees

Inventors

Classifications

  • G06F9/3832Primary

    Value prediction for operands; operand history buffers · CPC title

  • Instruction completion, e.g. retiring, committing or graduating · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Dependency mechanisms, e.g. register scoreboarding · CPC title

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10761844B2 cover?
Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3832. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).