Information processing apparatus
US-2024385843-A1 · Nov 21, 2024 · US
US2020150958A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020150958-A1 |
| Application number | US-201916671428-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 1, 2019 |
| Priority date | Nov 9, 2018 |
| Publication date | May 14, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor having a systolic array that can perform operations efficiently is provided. The processor includes multiple processing cores aligned in a matrix, and each of the processing cores includes an arithmetic unit array including multiple arithmetic units that can form a systolic array. Each of the processing cores includes a first memory that stores first data, a second memory that stores second data, a first multiplexer that connects a first input for receiving the first data at the arithmetic unit array to an output of the first memory in the processing core or an output of the arithmetic unit array in an adjacent processing core, and a second multiplexer that connects a second input for receiving the second data at the arithmetic unit array to an output of the second memory in the processing core or an output of the arithmetic unit array in an adjacent processing core.
Opening claim text (preview).
What is claimed is 1 . A processor, comprising: a plurality of processing cores, each of the processing cores including an arithmetic unit array including a plurality of arithmetic units, wherein each of the processing cores includes: a first memory that stores first data; a second memory that stores second data; a first multiplexer that connects a first input for receiving the first data at the arithmetic unit array to an output of the first memory in the processing core or an output of the arithmetic unit array in an adjacent processing core; and a second multiplexer that connects a second input for receiving the second data at the arithmetic unit array to an output of the second memory in the processing core or an output of the arithmetic unit array in an adjacent processing core. 2 . The processor as claimed in claim 1 , wherein each of the processing cores includes: a first address generator that generates a first address indicative of a destination storage of the first data output from the first memory; and a second address generator that generates a second address indicative of a destination storage of the second data output from the second memory. 3 . The processor as claimed in claim 1 , wherein each of the processing cores includes: an instruction memory that stores an instruction; and a third multiplexer that connects a third input for receiving an instruction at the arithmetic unit array to an output of the instruction memory in the processing core or a transmission path of an instruction in the arithmetic unit array in an adjacent processing core, and each of the arithmetic units includes: an instruction decoder that decodes an instruction; a plurality of types of arithmetic elements that perform operations based on the decoded instruction; and a register that stores data for use in operations or an operational result. 4 . The processor as claimed in claim 3 , wherein instructions are stored in the instruction memory in one of the plurality of processing cores, and an instruction output from the instruction memory is provided to the arithmetic unit in the processing core via the third multiplexer and to the arithmetic unit in a different processing core via the third multiplexer in the different processing core. 5 . The processor as claimed in claim 3 , wherein the plurality of types of arithmetic elements include a product sum operator and an arithmetic operator. 6 . The processor as claimed in claim 1 , wherein each of the plurality of processing cores has a resultant memory to store operational results of the plurality of arithmetic units in the processing core. 7 . The processor as claimed in claim 1 , wherein one of the first data and the second data is input data for use in a convolutional operation, and the other is weight data for use in the convolutional operation. 8 . The processor as claimed in claim 1 , further comprising: a network that interconnects the plurality of processing cores; and a controller that controls transmissions of the first data to the first memory and of the second data to the second memory and operations of the first multiplexer and the second multiplexer. 9 . The processor as claimed in claim 8 , wherein the controller controls to transmit the first data stored in the first memory in any of the plurality of processing cores to the first memory in a different processing core via the network during execution of an operation at the arithmetic unit array, and the controller controls to transmit the second data stored in the second memory in any of the plurality of processing cores to the second memory in a different processing core via the network during execution of an operation at the arithmetic unit array. 10 . The processor as claimed in claim 8 , wherein the plurality of processing cores are arranged in a matrix, and the controller: interconnects the plurality of arithmetic unit arrays aligned in a first direction via the first multiplexer and the plurality of arithmetic unit arrays aligned in a second direction orthogonal to the first direction via the second multiplexer to form a systolic array including a predetermined number of arithmetic units; connects an arithmetic unit located at an end of the plurality of arithmetic unit arrays aligned in the first direction to an output of the first memory; connects an arithmetic unit located at an end of the plurality of arithmetic unit arrays aligned in the second direction to an output of the second memory; and causes the first memory and the second memory to output the first data and the second data, respectively, and causes an arithmetic unit in the systolic array to perform operations. 11 . A control method for a processor, wherein the processor has a plurality of processing cores, each including an arithmetic unit array including a plurality of arithmetic units, a first memory storing first data, a second memory storing second data, a first multiplexer connecting a first input for receiving the first data at the arithmetic unit array to an output of the first memory in the processing core or an output of an arithmetic unit array in an adjacent processing core, and a second multiplexer connecting a second input for receiving the second data at the arithmetic unit array to an output of the second memory in the processing core or an output of an arithmetic unit array in an adjacent processing core, the method comprising: connecting an output of the first memory to one of a first predetermined number of arithmetic units aligned in a first direction, connecting an output of the second memory to one of a second predetermined number of arithmetic units aligned in a direction different from the first direction, using the first multiplexer to form a path to transmit the first data output from the first memory to the first predetermined number of arithmetic units sequentially and using the second multiplexer to form a path to transmit the second data output from the second memory to the second predetermined number of arithmetic units sequentially to form a systolic array including a predetermined number of arithmetic units; and transferring the first data output from the first memory to the systolic array and transferring the second data output from the second memory to the systolic array to cause an arithmetic unit in the systolic array to perform operations. 12 . The control method as claimed in claim 11 , further comprising: generating a first address to be supplied to the first memory and outputting the first data corresponding to the first address from the first memory; and generating a second address to be supplied to the second memory and outputting the second data corresponding to the second address from the second memory. 13 . The control method as claimed in claim 11 , wherein each of the plurality of processing cores has an instruction memory storing instructions and a third multiplexer connecting a third input for receiving an instruction at the arithmetic unit array to an output of the instruction memory or a transmission path of an instruction in an arithmetic unit array in an adjacent processing core, and the method further comprises storing an instruction in the instruction memory in a processing core corresponding to a corner part of the systolic array, transferring instructions output from the instruction memory to arithmetic units in the systolic array sequentially and causing the respective arithmetic units to perform operations corresponding to the instructions. 14 . The control method as claimed in claim 11 , wherein the plurality of processing cores are interconnected via a
Systolic arrays · CPC title
Learning methods · CPC title
Arithmetic instructions · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.