Who is the assignee on this patent?

Cavium Llc, Marvell Asia Pte Ltd

What technology area does this patent fall under?

Primary CPC classification G06F9/3877. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Architecture for irregular operations in machine learning inference engine

US11029963B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11029963-B2
Application number	US-201816226559-A
Country	US
Kind code	B2
Filing date	Dec 19, 2018
Priority date	Feb 8, 2018
Publication date	Jun 8, 2021
Grant date	Jun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing unit of an inference engine for machine learning (ML), comprising: a first data load streamer configured to stream a first data stream comprising a first plurality of data sections from an on-chip memory (OCM), using a single instruction, to an operator component by using an address of the OCM and a pattern of data to be loaded for the first data stream to be read and streamed; a second data load streamer configured to stream a second data stream comprising a second plurality of data sections from the OCM, using a single instruction, to the operator component by using an address of the OCM and a pattern of data to be loaded for the second data stream to be read and streamed; the operator component configured to perform a data operation on the first data stream and the second data stream; and a store streamer configured to receive a data output stream from the operator component and to store the data output stream in a buffer, wherein the pattern of data to be loaded for the first data stream includes a stride to a next block and a stride between lines, wherein the first data stream pattern is specified by one or more of a starting address, number of lines to read for each operation, number of bytes per line, and a number of blocks to read. 2. The processing unit of claim 1 , wherein the data operation is a matrix multiplication operation and is selected from a group consisting of determining a maximum value, calculating an average value for a stream of data, calculating an addition of the first data stream to the second data stream, calculating a multiplication of the first data stream to the second data stream, rewriting the first data stream in a different pattern for matrix transformation, Tanh operation, Sigmoid operation, spatial batch normalization operation, and local response normalization. 3. The processing unit of claim 1 further comprising an instruction controller configured to store instructions received from a core engine. 4. The processing unit of claim 1 , wherein the buffer is configured to stream the data output stream to the OCM for storage thereof. 5. The processing unit of claim 1 , wherein the data output stream is specified by one or more of a starting address, a number of lines to write, line stride between lines, a number of bytes per line, and stride to a next block. 6. The processing unit of claim 1 , wherein the first data load streamer, the second data load streamer, the operator component, and the store streamer are configured to iteratively execute and process data until a termination condition is met. 7. A processing unit of an inference engine for machine learning (ML), comprising: a first data load streamer configured to stream a first data stream comprising a first plurality of data sections from an on-chip memory (OCM), using a single instruction, to an operator component by using an address of the OCM and a pattern of data to be loaded for the first data stream to be read and streamed; a second data load streamer configured to stream a second data stream comprising a second plurality of data sections from the OCM, using a single instruction, to the operator component by using an address of the OCM and a pattern of data to be loaded for the second data stream to be read and streamed; the operator component configured to perform a matrix operation on the first data stream and the second data stream, wherein the matrix operation is performed by another processing unit that reads data within each matrix only once and wherein the another processing unit is configured to receive data within the each matrix as a data stream using a single instruction and further configured to operate on the each matrix as the data stream using a single instruction to generate an output matrix; and a store streamer configured to receive a data output stream from the operator component and to store the data output stream in a buffer, wherein the pattern of data to be loaded for the first data stream includes a stride to a next block and a stride between lines, wherein the data output stream is specified by a starting address, a number of lines to write, line stride between lines, a number of bytes per line, and stride to a next block. 8. The processing unit of claim 7 , wherein the matrix operation is a matrix multiplication operation and is selected from a group consisting of determining a maximum value, calculating an average value for a stream of data, calculating an addition of the first data stream to the second data stream, calculating a multiplication of the first data stream to the second data stream, rewriting the first data stream in a different pattern for matrix transformation, Tanh operation, Sigmoid operation, spatial batch normalization operation, and local response normalization. 9. The processing unit of claim 7 further comprising an instruction controller configured to store instructions received from a core engine. 10. The processing unit of claim 7 , wherein the first data stream pattern is specified by a starting address, number of lines to read for each operation, number of bytes per line, and a number of blocks to read. 11. The processing unit of claim 7 , wherein the buffer is configured to stream the data output stream to the OCM for storage thereof. 12. The processing unit of claim 7 , wherein the first data load streamer, the second data load streamer, the operator component, and the store streamer are configured to iteratively execute and process data until a termination condition is met. 13. A method comprising: streaming a first data stream comprising a first plurality of data sections from an on-chip memory (OCM), using a single instruction, to an operator component by using an address of the OCM and a pattern of data to be loaded for the first data stream to be read and streamed; streaming a second data stream comprising a second plurality of data sections from the OCM to the operator component by using an address of the OCM and a pattern of data to be loaded for the second data stream to be read and streamed; performing a data operation on the first data stream and the second data stream; streaming a data output stream resulting from the performing; and storing the data output stream, wherein the pattern of data to be loaded for the first data stream includes a stride to a next block and a stride between lines, wherein the first data stream pattern is specified by a starting address, number of lines to read for each operation, number of bytes per line, and a number of blocks to read. 14. The method of claim 13 , wherein the data operation is a matrix multiplication and is selected from a group consisting of determining a maximum value, calculating an average value for a stream of data, calculating an addition of the first data stream to the second data stream, calculating a multiplication of the first data stream to the second data stream, rewriting the first data stream in a different pattern for matrix transformation, Tanh operation, Sigmoid operation, spatial batch normalization operation, and local response normalization. 15. The method of claim 13 further comprising storing instructions received from a core engine. 16. The method of claim 13 , wherein the data output stream is specified by a starting address, a number of lines to write, line stride between lines, a number of bytes per line, and stride to a next block. 17. The method of claim 13 further comprising iteratively repeating the streaming the first data stream, the streaming the second data stream, the performing the po

Assignees

Inventors

Classifications

G06F9/3851
from multiple instruction streams, e.g. multistreaming · CPC title
G06F15/7807
System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title
G06F15/7864
on more than one IC chip · CPC title
G06F9/3877Primary
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
G06N20/20
Ensemble learning · CPC title

Patent family

Related publications grouped by family.

View patent family 67475161

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11029963B2 cover?: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The oper…
Who is the assignee on this patent?: Cavium Llc, Marvell Asia Pte Ltd
What technology area does this patent fall under?: Primary CPC classification G06F9/3877. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).