Low latency synchronization for operation cache and instruction cache fetching and decoding instructions

US10896044B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10896044-B2
Application numberUS-201816014715-A
CountryUS
Kind codeB2
Filing dateJun 21, 2018
Priority dateJun 21, 2018
Publication dateJan 19, 2021
Grant dateJan 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for converting instruction addresses of a first predicted address block into decoded micro-operations for output to an operations queue that stores decoded micro-operations in program order, and for subsequent execution by a remainder of an instruction execution pipeline, the method comprising: providing an index associated with the first predicted address block to an operation cache tag array to obtain a first tag associated with a first operation cache tag array entry and a second tag associated with a second operation cache tag array entry; in a first computer clock cycle, determining that both the first tag and the second tag match a tag derived from the predicted address block and that an end address associated with the first operation cache tag array entry matches a start address associated with the second operation cache tag array entry; in the first computer clock cycle, identifying a set of instructions for which decoded micro-operations are stored in an operation cache data array of an operation cache path, as instructions at addresses associated with both the first operation cache tag array entry and the second operation cache tag array entry; storing a first operation cache queue entry for the set of instructions in an operation cache queue, the first operation cache queue entry including an indication indicating whether to wait to receive a signal from an instruction cache path to proceed; obtaining decoded micro-operations corresponding to the first operation cache queue entry from the operation cache; and outputting the decoded micro-operations corresponding to the first operation cache queue entry to the operations queue, at a time that is based on the indication of the first operation cache queue entry indicating whether to wait to receive the signal from the instruction cache path to proceed. 2. The method of claim 1 , wherein: the indication indicates a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes waiting until the prior micro-operations from the instruction cache path are written to or in-flight to the operations queue before outputting the decoded micro-operations corresponding to the first operation cache queue to the operations queue. 3. The method of claim 1 , wherein: the indication does not indicate a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes outputting the decoded micro-operations corresponding to the first operation cache queue entry without waiting for prior micro-operations from the instruction path. 4. The method of claim 1 , further comprising: converting instruction addresses of a second predicted address block into decoded micro-operations for output to the operations queue and for subsequent execution by the remainder of an instruction execution pipeline, by: identifying that the second predicted address block includes at least one instruction for which decoded micro-operations are not stored in the operation cache; storing an instruction cache queue entry in an instruction cache queue; obtaining instruction bytes for the instruction cache queue entry in an instruction byte buffer, along with an indication indicating whether to wait for prior operations from the operation cache to be written to or in flight to the operations cache; decoding the instruction bytes to obtain decoded micro-operations corresponding to the instruction byte buffer entry, at a time that is based on the indication indicating whether to wait for the prior operations from the operations cache path; and outputting the decoded micro-operations corresponding to the instruction byte buffer entry to the operations queue for storage. 5. The method of claim 4 , wherein: the indication indicating whether to wait for the prior operations from the operations cache path indicates a need to wait for the prior operations from the operations cache path; and decoding the instruction bytes comprises decoding the instruction bytes after the prior operations from the operations cache path are written to or in flight to the operations queue. 6. The method of claim 4 , wherein: the indication indicating whether to wait for the prior operations from the operations cache path indicates no need to wait for the prior operations from the operations cache path; and decoding the instruction bytes comprises decoding the instruction bytes without waiting for prior operations from the operations cache path to be written to or in flight to the operations queue. 7. The method of claim 1 , wherein: the first predicted address block includes at least one instruction for which decoded micro-operations are stored in an operation cache and at least one instruction for which decoded operations are not stored in the operation cache. 8. The method of claim 1 , further comprising: executing the micro-operations stored in the operations queue. 9. An instruction fetch and decode unit for converting instruction addresses of a first predicted address block into decoded micro-operations for output to an operations queue that stores decoded micro-operations in program order, and for subsequent execution by a remainder of an instruction execution pipeline, the instruction fetch and decode unit comprising: a shared fetch logic configured to: provide an index associated with the first predicted address block to an operation cache tag array to obtain a first tag associated with a first operation cache tag array entry and a second tag associated with a second operation cache tag array entry, in a first computer clock cycle, determine that both the first tag and the second tag match a tag derived from the predicted address block and that an end address associated with the first operation cache tag array entry matches a start address associated with the second operation cache tag array entry, and in the first computer clock cycle, identify a set of instructions for which decoded micro-operations are stored in an operation cache data array of an operation cache path, as instructions at addresses associated with both the first operation cache tag array entry and the second operation cache tag array entry; an operation cache queue configured to store a first operation cache queue entry for the set of instructions, the first operation cache queue entry including an indication indicating whether to wait to receive a signal from an instruction cache path to proceed; and an operation cache data read logic configured to obtain decoded micro-operations corresponding to the first operation cache queue entry from the operation cache, and to output the decoded micro-operations corresponding to the first operation cache queue entry to the operations queue, at a time that is based on the indication of the first operation cache queue entry indicating whether to wait to receive the signal from the instruction cache path to proceed. 10. The instruction fetch and decode unit of claim 9 , wherein: the indication indicates a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes waiting until the prior micro-operations from the instruction cache path are written to or in-flight to the operations queue before outputting the decoded micro-operations corresponding to the first operation cache queue to the operations queue. 11. The instr

Assignees

Inventors

Classifications

  • using decoder, e.g. decoder per instruction set, adaptable or programmable decoders · CPC title

  • Prefetch instructions; cache control instructions · CPC title

  • using dynamic branch prediction, e.g. using branch history tables · CPC title

  • Implementation provisions of instruction buffers, e.g. prefetch buffer; banks · CPC title

  • Pipelined decoding, e.g. using predecoding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10896044B2 cover?
The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the ins…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/30047. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).