Computational array microprocessor system using non-consecutive data formatting

US11157441B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11157441-B2
Application numberUS-201815920173-A
CountryUS
Kind codeB2
Filing dateMar 13, 2018
Priority dateJul 24, 2017
Publication dateOct 26, 2021
Grant dateOct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.

First claim

Opening claim text (preview).

What is claimed is: 1. A microprocessor system, comprising: a computational array that includes a plurality of computation units, wherein each of the plurality of computation units operates on a corresponding value addressed from memory and the values operated by the plurality of computation units are synchronously provided together to the computational array as a group of values to be processed in parallel, the group of values being utilized as a first input to the computational array; and a hardware data formatter configured to gather the group of values based on a data formatting operation, the data formatting operation identifying at least a stride, and the hardware data formatter comprising a plurality of read buffers configured to store respective subsets of the values, wherein each subset corresponds to values located consecutively in the memory, wherein a number of values from each subset is determined based on the stride, the number of values indicating values of each subset which are to be utilized for processing based on the stride, wherein remaining values of each subset are not utilized, wherein the group of values includes the values of each subset which are to be utilized and the remaining values of each subset which are not utilized, wherein the group of values are provided, by the hardware data formatter, to the computational array, and wherein the computational array disables particular computation units corresponding to the remaining values of each subset which are not utilized. 2. The system of claim 1 , wherein the subsets of the values include a first subset of values located consecutively in the memory and a second subset of values located consecutively in the memory, wherein the first subset of values is not located consecutively in the memory from the second subset of values, and wherein a difference in memory address between the first subset of values and the second subset of values is based on the stride. 3. The system of claim 1 , wherein the computational array is configured to receive at least two vector input operands. 4. The system of claim 1 , wherein the computational array is configured to perform a dot-product component operation using the group of values in parallel. 5. The system of claim 1 , wherein each computation unit of the plurality of computation units includes an arithmetic logic unit, an accumulator, and a shadow register. 6. The system of claim 1 , wherein the first input corresponds to an input channel of vision data. 7. The system of claim 1 , wherein the first input corresponds to sensor data. 8. The system of claim 7 , wherein the sensor data is non-image sensor data. 9. The system of claim 8 , wherein the non-image sensor data includes ultrasonic, radar, or LiDAR data. 10. The system of claim 1 , wherein the first input corresponds to a convolution filter. 11. The system of claim 10 , wherein the convolution filter is constructed to identify features of a second input which corresponds to sensor data. 12. The system of claim 2 , wherein the first subset of values is retrieved from a cache using a single cache read. 13. The system of claim 2 , wherein the first subset of values and the second subset of values are retrieved from a single cache line. 14. The system of claim 1 , wherein the memory is configured to dynamically adjust an allocation between a first portion of the memory for a data input and a second portion of the memory for a weight input. 15. The system of claim 2 , wherein the hardware data formatter is configured to determine a corresponding start memory address for each of the first subset and the second sub set. 16. The system of claim 15 , wherein the hardware data formatter is configured to determine a corresponding end memory address for each of the first subset of values and the second subset of values. 17. The system of claim 15 , wherein a cache check is performed for each of the first subset and the second subset including by determining whether a value stored at the determined starting memory addresses for the first subset has been cached and determining whether a value stored at the determined starting memory addresses for the second subset has been cached. 18. The system of claim 2 , wherein a cache check is performed for the first subset including by determining whether a first value and a last value for the first subset are stored in a cache. 19. The system of claim 1 , wherein the data formatting operation further indicates a padding parameter. 20. A method comprising: receiving a computational operation; receiving a data formatting operation at a hardware data formatter, the data formatting operation indicating at least a stride, wherein the hardware data formatter comprises a plurality of read buffers; retrieving a first group of values associated with an input data, wherein the first group of values includes a first subset of values located consecutively in a memory and a second subset of values located consecutively in the memory, and the first subset of values is not located consecutively in the memory from the second subset of values, wherein a number of values from the first subset is determined based on the stride, the number of values indicating values of the first subset which are to be utilized for the computational operation based on the stride, and wherein remaining values of the first subset are not utilized; retrieving a second group of values associated with a weight data; providing in parallel the first group of values and the second group of values to a computational array microprocessor comprising a plurality of computational units arranged as a matrix, wherein the computational array disables particular computation units corresponding to the remaining values of the first subset which are not utilized; and processing the first group of values and the second group of values as operands in parallel using the computational array. 21. A microprocessor system, comprising: a computational array that includes a plurality of computation units, wherein each of the plurality of computation units operates on a corresponding value addressed from memory and the values operated by the plurality of computation units are synchronously provided together to the computational array as a group of values, wherein the group of values includes at least 96 values and the group of values includes at least 12 subsets of values; a hardware data formatter configured to gather the group of values based on a data formatting operation, the data formatting operation identifying at least a stride, and the hardware data formatter comprising a plurality of read buffers configured to store the at least 12 subsets, wherein the group of values includes a first subset of values located consecutively in the memory and a second subset of values located consecutively in the memory, and the first subset of values is not required to be located consecutively in the memory from the second subset of values, wherein a number of values from the first subset is determined based on the stride, the number of values indicating values of the first subset which are to be utilized for the computational operation, and wherein remaining values of the first subset are not utilized, wherein the group of values includes the values of the first subset which are to be utilized based on the stride and the remaining values of the first subset which are not utilized, wherein the group of values are provided, by the hardware data formatter, to the computational

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11157441B2 cover?
A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data f…
Who is the assignee on this patent?
Tesla Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).