Streaming engine for machine learning architecture

US12112174B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12112174-B2
Application numberUS-201816226534-A
CountryUS
Kind codeB2
Filing dateDec 19, 2018
Priority dateFeb 8, 2018
Publication dateOct 8, 2024
Grant dateOct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A programmable hardware system for machine learning (ML) includes a core and a streaming engine. The core receives a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning. The core transmits a first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for efficient processing thereof. The first subset of commands and the associated data are passed through via a function call. The streaming engine is coupled to the core and receives the first subset of commands and the associated data from the core. The streaming engine streams a second subset of commands of the first subset of commands and its associated data to an inference engine by executing a single instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A programmable hardware system for machine learning (ML), comprising: a core configured to receive a plurality of commands and data from a host to be analyzed and inferred via machine learning; divide the plurality of commands into a first subset of commands associated with performance-critical operations and a second subset of commands associated with performance-noncritical operations, wherein the performance-critical operations include at least one or more of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of any of a tree, a graph, and a priority queue, wherein the performance-noncritical operations include at least one or more of data collection and data mapping, wherein the performance-critical operations exclude any of data collection and data mapping, wherein the performance-noncritical operations exclude any of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of a tree, a graph, and a priority queue; transmit each command of the first subset of commands of the plurality of commands for performance-critical operations and associated data thereof to an inference engine for processing via a function call, wherein the each command of the first subset of commands and/or the associated data are encapsulated as parameters in the function call, wherein the second subset of commands associated with performance-noncritical operations is not transmitted to the inference engine; an instruction streaming engine coupled to the core and further coupled to the inference engine, wherein the streaming engine is configured to retrieve and maintain the each command of the first subset of commands and/or the associated data from the function call at a specific location in a buffer; stream the each command of the first subset of commands and/or its associated data to the inference engine from the buffer; and said inference engine configured to retrieve the each command of the first subset of commands and/or its associated data streamed from the buffer; perform the performance-critical operations according to the each command of the first subset of commands; analyze the data; and infer a subject from the data. 2. The programmable hardware system of claim 1 , wherein the streaming engine is further configured to receive a streamed inferred data from the inference engine. 3. The programmable hardware system of claim 2 , wherein the streaming engine is further configured to stream the received inferred data to the core. 4. The programmable hardware system of claim 1 , wherein the streaming engine comprises: an instruction streaming engine coupled to the core, wherein the instruction streaming engine is configured to stream the first subset of commands to the inference engine; and a data steaming engine coupled to the inference engine and configured to generate one or more streams of data associated with the first subset of commands, and wherein the data streaming engine is configured to stream the one or more streams of data to the inference engine be analyzed and inferred. 5. The programmable hardware system of claim 1 , wherein the streaming engine is configured to stream instructions to the inference engine in an instruction set architecture that is different from an instruction set architecture format received from the core. 6. The programmable hardware system of claim 1 , wherein the buffer is coupled to the core and further coupled to the streaming engine, wherein the core continuously writes to the buffer until a certain condition is met, and wherein the streaming engine continuously reads from the buffer until another certain condition is met. 7. The programmable hardware system of claim 6 , wherein the certain condition is when available buffer associated with the buffer is below a threshold value. 8. The programmable hardware system of claim 7 , wherein the available buffer is tracked using a head pointer maintained by the core locally, and wherein the head pointer is incremented each time the core writes to the buffer and the available buffer associated with the buffer is decremented each time the core writes to the buffer. 9. The programmable hardware system of claim 7 , wherein the core reads a value stored in a memory mapped input/output (MIMO) responsive to the certain condition being met, wherein the MIMO stores a value of the head pointer and a tail pointer associated with a location the streaming engine reads from the buffer, and wherein the core is configured to set the available buffer size. 10. The programmable hardware system of claim 9 , wherein the core is configured to set the available buffer size to the tail pointer minus the head pointer and result thereof modulo actual size of the buffer. 11. The programmable hardware system of claim 6 , wherein the another certain condition is when buffer size to read from is greater than zero. 12. The programmable hardware system of claim 11 , wherein the buffer size to read from is tracked using a tail pointer maintained by the streaming engine locally, and wherein the tail pointer is incremented each time the streaming engine reads from the buffer and wherein the buffer size to read from is the tail pointer minus a head pointer and result thereof modulo actual size of the buffer, wherein the head pointer is maintained by the core locally and incremented each time the core writes to the buffer. 13. The programmable hardware system of claim 1 wherein the core maintains a head pointer where the core writes to and wherein the streaming engine maintains a tail pointer where the streaming engine reads from, and wherein the head pointer and the tail pointer are stored in a memory mapped input/output (MMIO) space that is mapped into registers in the streaming engine. 14. The programmable hardware system of claim 1 , wherein the buffer is a circular buffer allocated in a DDR memory, and wherein a size of the buffer is fixed a-priori at compile time. 15. A programmable hardware system for machine learning (ML), comprising: a core configured to receive a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning, wherein the core is configured to divide the plurality of commands into a first subset of commands associated with performance-critical operations and a second subset of commands associated with performance-noncritical operations, wherein the performance-critical operations include at least one or more of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of any of a tree, a graph, and a priority queue, wherein the performance-noncritical operations include at least one or more of, data collection and data mapping, wherein the performance-critical operations exclude any of data collection and data mapping, wherein the performance-noncritical operations exclude any of a matrix operation, tanh operation, sigmoid operation, memory transpose operation, addition operations, and operations on one or more of a tree, a graph, and a priority queue and wherein the core is further configured to transmit the first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for processing, wherein the first subset of commands and the associated data are passed through via a function call, wherein the second subset of commands associated with performance noncritical operations is

Assignees

Inventors

Classifications

  • System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • for non-native instruction set, e.g. Javabyte, legacy code · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112174B2 cover?
A programmable hardware system for machine learning (ML) includes a core and a streaming engine. The core receives a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning. The core transmits a first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for …
Who is the assignee on this patent?
Cavium Llc, Marvell Asia Pte Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30174. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).