What technology area does this patent fall under?

Primary CPC classification G06F9/30036. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Vector reductions using shared scratchpad memory

US11934826B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11934826-B2
Application number	US-202117530869-A
Country	US
Kind code	B2
Filing date	Nov 19, 2021
Priority date	Feb 26, 2020
Publication date	Mar 19, 2024
Grant date	Mar 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed using an integrated circuit for a hardware machine-learning accelerator that includes a plurality of cores and a shared memory that communicates with each of the plurality of cores, the method comprising: generating, by each of the plurality of cores, a respective vector of values; performing, across the plurality of cores and into a shared memory cell in the shared memory, a plurality of atomic vector reductions using each of the respective vectors and an operator unit of the shared memory without synchronization; and generating a result vector based on the plurality of atomic vector reductions. 2. The method of claim 1 , wherein performing the plurality of atomic vector reductions comprises: accumulating a first vector stored in the shared memory cell with a respective second vector generated by one or more of the plurality of cores. 3. The method of claim 1 , wherein: each of the plurality of cores comprises a respective vector-processing unit; and generating a respective vector of values comprises: generating, by each of the vector-processing units, a respective vector of values. 4. The method of claim 3 , wherein each of the operator unit and the shared memory is external to the respective vector-processing unit in each of the plurality of cores. 5. An integrated circuit for a hardware machine-learning accelerator, the integrated circuit comprising: a plurality of cores; a shared memory that communicates with each of the plurality of cores; and a non-transitory machine-readable storage device for storing instructions that are executable by a processor to cause performance of operations comprising: generating, by each of the plurality of cores, a respective vector of values; performing, across the plurality of cores and into a shared memory cell in the shared memory, a plurality of atomic vector reductions using each of the respective vectors and an operator unit of the shared memory without synchronization; and generating a result vector based on the plurality of atomic vector reductions. 6. The integrated circuit of claim 5 , wherein performing the plurality of atomic vector reductions comprises: accumulating a first vector stored in the shared memory cell with a respective second vector generated by one or more of the plurality of cores. 7. The integrated circuit of claim 5 , wherein: each of the plurality of cores comprises a respective vector-processing unit; and generating a respective vector of values comprises: generating, by each of the vector-processing units, a respective vector of values. 8. The integrated circuit of claim 7 , wherein each of the operator unit and the shared memory is external to the respective vector-processing unit in each of the plurality of cores.

Assignees

Google Llc

Inventors

Classifications

G06F9/30036Primary
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F9/30038
using a mask · CPC title
G06F9/3001
Arithmetic instructions · CPC title
G06F9/3004
to perform operations on memory · CPC title

Patent family

Related publications grouped by family.

View patent family 77366021

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11934826B2 cover?: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the r…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).