Two-stage vector reduction using two-dimensional and one-dimensional systolic arrays

US2016267111A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016267111-A1
Application numberUS-201514715557-A
CountryUS
Kind codeA1
Filing dateMay 18, 2015
Priority dateMar 11, 2015
Publication dateSep 15, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples of the disclosure efficiently processing data sets. In some examples, a plurality of first processor elements process a first data set (e.g., an image) and a second data set (e.g., a kernel) using a first function to generate a third data set. The third data set is processed using a second function to generate an output element. The first processor elements are arranged in a two-dimensional systolic array such that one or more first processor elements receive input from a first adjacent first processor element and transmit output to a second adjacent first processor element. A plurality of second processor elements aggregate the output element to at least partially generate a fourth data set. The plurality of second processor elements arranged in a one-dimensional array. Aspects of the disclosure facilitate increasing speed, conserving memory, reducing processor load or an amount of energy consumed, and/or reducing network bandwidth usage.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a plurality of first processor elements configured to process a first data set and a second data set using a first function to generate a third data set, and process the third data set using a second function to generate an output element, the plurality of first processor elements arranged in a two-dimensional systolic array such that one or more first processor elements of the plurality of first processor elements are configured to receive input from one or more first adjacent first processor elements and transmit output to one or more second adjacent first processor elements; and a plurality of second processor elements configured to aggregate the output element to at least partially generate a fourth data set, the plurality of second processor elements arranged in a one-dimensional array. 2 . The system of claim 1 , further comprising a sensor module configured to capture data corresponding to one or more images, and transmit the one or more images towards the plurality of first processor elements, the first data set associated with the one or more images. 3 . The system of claim 1 , wherein the second data set is associated with a filter. 4 . The system of claim 1 , wherein the plurality of first processor elements are configured to retrieve the first data set from a memory area, the first data set and the third data set processed locally at the system without transmitting data to or retrieving additional data from the memory area. 5 . The system of claim 1 , wherein the plurality of first processor elements are arranged in a plurality of rows, a first row of the plurality of rows associated with a first element of the first data set. 6 . The system of claim 1 , wherein the plurality of first processor elements are arranged in a plurality of columns, a first column of the plurality of columns associated with a first element of the second data set. 7 . The system of claim 1 , wherein one or more first processor elements of the plurality of first processor elements are configured to sequentially process a plurality of elements included in the first data set. 8 . The system of claim 1 , wherein one or more first processor elements of the plurality of first processor elements are configured to process a first element included in the first data set sequentially using a plurality of second elements included in the second data set. 9 . The system of claim 1 , wherein one or more of the plurality of first processor elements and the plurality of second processor elements are modifiable to modify a rate at which one or more of the output element and the fourth data set are generated. 10 . A method of processing a data set using a processor module including a two-dimensional array and a one-dimensional array, the two-dimensional array including a plurality of first processor elements, the one-dimensional array including a plurality of second processor elements, the method comprising: processing, at the two-dimensional array, a first data set and a second data set using a first function to generate a third data set, one or more processor elements of the two-dimensional array receiving input from one or more first adjacent processor elements of the two-dimensional array and transmitting output to one or more second adjacent processor elements of the two-dimensional array; processing, at the two-dimensional array, the third data set using a second function to generate an output element; and aggregating, at the one-dimensional array, the output element to at least partially generate a fourth data set. 11 . The method of claim 10 , further comprising generating, at a sensor module, one or more images associated with the first data set. 12 . The method of claim 10 , further comprising: retrieving the first data set from a memory area; and locally processing the first data set and the third data set at the processor module without transmitting data to or retrieving additional data from the memory area. 13 . The method of claim 10 , wherein processing a first data set comprises sequentially processing a plurality of elements included in the first data set. 14 . The method of claim 10 , wherein processing a first data set comprises processing a first element included in the first data set sequentially using a plurality of second elements included in the second data set. 15 . The method of claim 10 , wherein processing the third data set comprises generating, at one or more processor elements of the two-dimensional array, a respective output element per clock cycle. 16 . A mobile device comprising: a sensor module configured to capture data corresponding to an image; a memory area storing computer-executable instructions for processing a first data set associated with the image; a first processor array configured to execute the computer-executable instructions to: apply a first function to the first data set using a second data set to generate a third data set; and apply a second function to the third data set to generate an output element, one or more processor elements of the first processor array configured to receive input from one or more first adjacent processor elements and transmit output to one or more second adjacent processor elements; and a second processor array configured to execute the computer-executable instructions to aggregate the output element to at least partially generate a fourth data set. 17 . The mobile device of claim 16 , wherein the first processor array is configured to retrieve the first data set from a memory area, the first data set and the third data set processed locally at the mobile device without transmitting data to or retrieving additional data from the memory area. 18 . The mobile device of claim 16 , wherein the first processor array is arranged in a plurality of rows and a plurality of columns, one or more rows of the plurality of rows associated with a respective element of one or more first data sets, and one or more columns of the plurality of columns associated with a respective element of the second data set. 19 . The mobile device of claim 16 , wherein one or more processor elements of the first processor array are configured to sequentially process a plurality of elements included in the first data set. 20 . The mobile device of claim 16 , wherein one or more processor elements of the first processor array are configured to process a first element included in the first data set sequentially using a plurality of second elements included in the second data set.

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/211Primary

    Schema design and management · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016267111A1 cover?
Examples of the disclosure efficiently processing data sets. In some examples, a plurality of first processor elements process a first data set (e.g., an image) and a second data set (e.g., a kernel) using a first function to generate a third data set. The third data set is processed using a second function to generate an output element. The first processor elements are arranged in a two-dimens…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).