Coupling wide memory interface to wide write back paths

US10963379B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10963379-B2
Application numberUS-201815887640-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2018
Priority dateJan 30, 2018
Publication dateMar 30, 2021
Grant dateMar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising a processor having two or more execution lanes, the processor comprising: a data cache coupled to memory; a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache; and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, the writeback circuit including interconnect configurable to, during a particular clock cycle, select and send either words from the cache line of the data cache or words from the execution lanes to an operand buffer. 2. The apparatus of claim 1 , wherein: the writeback circuit is further situated to send the respective word to an operand buffer for the selected execution lane. 3. The apparatus of claim 1 , wherein: the writeback circuit is further situated to send the respective word to bypass an operand buffer by sending the respective word directly to an execution unit of the selected execution lane. 4. The apparatus of claim 3 , wherein the respective word is not stored in an operand buffer. 5. The apparatus of claim 1 , wherein: the selected execution lane is configured to perform a single instruction multiple data (SIMD) operation with the respective word, the operation being performed separately for each of two or more portions of the respective word. 6. The apparatus of claim 1 , wherein the writeback circuit comprises electrical or photonic interconnect wires and selection logic comprising as least one of a logic multiplexer, pass-gate multiplexer, transmission gate multiplexer, or tri-state bus. 7. An apparatus comprising a processor having two or more execution lanes, the processor comprising: a data cache coupled to memory; a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache; a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor; and a sharding circuit coupled to the wide memory load circuit, the sharding circuit being configured to select individual words from the cache line and to send each of the selected words to a selected writeback channel of the processor. 8. The apparatus of claim 7 , wherein the writeback circuit is further situated to send the respective word to bypass an operand buffer by sending the respective word directly to an execution unit of the selected execution lane. 9. An apparatus comprising a processor having two or more execution lanes, the processor comprising: a data cache coupled to memory; a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache; a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor; and a sharding circuit configured to reorder and send a respective word from the writeback circuit to a respective one of the selected execution lanes. 10. The apparatus of claim 9 , wherein: the sharding circuit is configured to reorder the respective word by performing at least one of the following operations: shift, rotate, reverse, move, swap, transpose, replicate, extract, or extend. 11. The apparatus of claim 9 , wherein the writeback circuit is further situated to send the respective word to bypass an operand buffer by sending the respective word directly to an execution unit of the selected execution lane. 12. A method of operating a processor having a plurality of execution lanes, the method comprising: concurrently loading a plurality of two or more words from a single read port of a data cache; sending a selected word of the plurality of words to a selected one of the execution lanes of the processor; receiving the selected word via a writeback circuit; and with the selected execution lane, multiplying the received word by an operand output, thereby producing a product. 13. The method of claim 12 , wherein: the selected word is sent to the selected one of the execution lanes via a writeback path, the writeback path being adapted to select and send at least an execution lane output or the selected word to the selected execution lane. 14. The method of claim 12 , wherein: the selected word bypasses an operand buffer and is sent directly to execution resources of the selected execution lane. 15. The method of claim 12 , wherein the selected word is a first selected word and wherein the plurality of words is a first plurality of words, further comprising: concurrently loading a second plurality of two or more words from a single read port of a data cache; and adding a second, selected word of the second plurality of words to the product. 16. One or more computer-readable storage media storing computer-readable instructions that when executed by a processor, cause the processor to perform a method, the method comprising: identifying at least one vector operation in code for at least one instruction block; and emitting object code for the at least one instruction block that, when the object code is executed by a processor, causes the processor executing the object code to perform a method, the method comprising: concurrently loading a plurality of two or more words from a single read port of a data cache, sending a selected word of the plurality of words to a selected one of the execution lanes of the processor, receiving the selected word via a writeback circuit, and with the selected execution lane, multiplying the received word by an operand output, thereby producing a product. 17. The computer-readable storage media of claim 16 , wherein the object code includes at least one instruction encoded to indicate that the plurality of words is to be loaded and sent to the selected execution lanes. 18. A method of operating a processor, comprising: with a wide memory load circuit, concurrently loading two or more words from a cache line of a data cache coupled to memory; and with a writeback circuit, sending a respective word of the concurrently-loaded words to a selected execution lane of the processor; and with a sharding circuit coupled to the wide memory load circuit, selecting individual words from the cache line and sending each of the selected words to a selected writeback channel of the processor. 19. The method of claim 18 , further comprising: with the writeback circuit, sending the respective word to an operand buffer for the selected execution lane. 20. The method of claim 18 , further comprising: with the writeback circuit, sending the respective word to bypass an operand buffer by sending the respective word directly to an execution unit of the selected execution lane.

Assignees

Inventors

Classifications

  • using a mask · CPC title

  • Instruction completion, e.g. retiring, committing or graduating · CPC title

  • Result writeback, i.e. updating the architectural state or memory · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10963379B2 cover?
Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective wor…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).