Streaming engine with early exit from loop levels supporting early exit loops and irregular loops

US11714646B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11714646-B2
Application numberUS-202117163639-A
CountryUS
Kind codeB2
Filing dateFeb 1, 2021
Priority dateJun 29, 2017
Publication dateAug 1, 2023
Grant dateAug 1, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteration of the loop. If the specified loop was not the outermost loop, the streaming engine begins an iteration of a next outer loop. If the specified loop was the outermost nested loop, the streaming engine ends the stream. The streaming engine places a vector of data elements in order in lanes within a stream head register. A stream break instruction is operable upon a vector break.

First claim

Opening claim text (preview).

What is claimed is: 1. A circuit device comprising: a streaming engine that includes: an address generator configured to generate a set of addresses based on a set of loops, wherein the address generator includes a first logic circuit comprising a first input configured to receive a register status signal, a second input configured to receive a break signal that is based on a stream break instruction, and an output configured to end a current iteration of a loop of the set of loops based on the stream break instruction once the register status signal is asserted, such that the current iteration of the loop continues until the register status signal is asserted; a memory interface coupled to the address generator and configured to: couple to a memory; and retrieve a set of data from the memory based on the set of addresses generated by the address generator; and a stream head register coupled to the memory interface and to the address generator and configured to: receive the set of data from the memory interface; couple to a functional unit of a processor; provide the set of data to the functional unit as a data stream; and provide the register status signal to the address generator. 2. The circuit device of claim 1 , wherein the stream head register is configured to assert the register status signal based upon the stream head register being full. 3. The circuit device of claim 1 , wherein the address generator is further configured to: compare a loop count associated with the loop to a total number of iterations of the loop; and end the current iteration of the loop based on the comparison of the loop count to the total number of iterations of the loop. 4. The circuit device of claim 1 , wherein the address generator is configured to, based on the ending of the current iteration of a first loop, determine whether to begin an iteration of a next-most outer loop or to end the data stream based on whether the first loop is an outermost loop. 5. The circuit device of claim 1 , wherein the stream break instruction specifies the loop from among the set of loops. 6. The circuit device of claim 1 , wherein the address generator is configured to receive a set of stream break instructions, wherein each instruction of the set of stream break instructions is associated with a respective loop of the set of loops. 7. The circuit device of claim 1 , wherein the streaming engine is configured to: receive a read-and-advance instruction from the functional unit; and based on the read-and-advance instruction: provide a first data element of the data stream from the stream head register to the functional unit; and load a second data element of the data stream into the stream head register. 8. The circuit device of claim 7 , wherein the streaming engine is configured to: receive a read instruction from the functional unit; and based on the read instruction, provide a third data element of the data stream from the stream head register to the functional unit without loading a fourth data element of the data stream into the stream head register. 9. The circuit device of claim 1 , wherein: the streaming engine includes a stream template register coupled to the address generator; and the stream template register is configured to store a stream template that specifies total numbers of iterations for the set of loops. 10. The circuit device of claim 9 , wherein the stream template further specifies a number of loops in the set of loops. 11. The circuit device of claim 1 , wherein the stream head register is configured as a first-in-first-out buffer. 12. The circuit device of claim 1 further comprising a system-on-a-chip that includes the processor and the memory. 13. The circuit device of claim 1 , wherein: the memory is a level two cache of a cache hierarchy; and the memory interface is configured to retrieve the set of data from the level two cache via a data path that does not include a level one cache of the cache hierarchy. 14. A method comprising: receiving a stream open instruction; based on the stream open instruction, generating a set of addresses using a set of loops; retrieving a set of data associated with the set of addresses from a memory; providing the set of data to a register; providing the set of data from the register to a functional unit of a processor as a data stream; and asserting a register status signal based on the register, wherein the generating of the set of addresses includes determining whether to end an iteration of a loop of the set of loops using an output of a first logic circuit based on the register status signal and based on a break instruction such that the iteration of the loop of the set of loops continues until the register status signal is asserted, wherein the first logic circuit comprises a first input that receives the register status signal, and a second input that receives a break signal based on the break instruction. 15. The method of claim 14 , wherein the register status signal specifies whether the register is full. 16. The method of claim 14 , wherein the generating of the set of addresses includes: comparing a loop count associated with the loop to a total number of iterations of the loop; and determining whether to end the iteration of the loop based on the comparing of the loop count to the total number of iterations of the loop. 17. The method of claim 14 , wherein the generating of the set of addresses includes, based on ending of the iteration of the loop, determining whether to begin an iteration of a next-most outer loop or end the data stream based on whether the loop is an outermost loop. 18. The method of claim 14 , wherein the break instruction specifies the loop from among the set of loops. 19. The method of claim 14 , wherein the providing of the set of data from the register to the functional unit as the data stream includes: receiving a read-and-advance instruction from the functional unit; and based on the read-and-advance instruction: providing a first data element of the data stream from the register to the functional unit; and loading a second data element of the data stream into the register. 20. The method of claim 19 , wherein the providing of the set of data from the register to the functional unit as the data stream includes: receiving a read instruction from the functional unit; and based on the read instruction, providing a third data element of the data stream from the register to the functional unit without loading a fourth data element of the data stream into the register. 21. The circuit device of claim 1 , wherein the first logic circuit comprises an AND gate having a first input coupled to the first input of the first logic circuit, a second input coupled to the second input of the first logic circuit, and an output coupled to the output of the first logic circuit. 22. The circuit device of claim 21 , wherein a break flag bit is configured to be asserted in response to the stream break instruction, and wherein the second input of the AND gate is coupled to the break flag bit.

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Prefetch instructions; cache control instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11714646B2 cover?
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteratio…
Who is the assignee on this patent?
Texas Instruments Inc, Texas Instmments Incorporated
What technology area does this patent fall under?
Primary CPC classification G06F9/30065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).