Neural processing device and method for pruning thereof

US2022300816A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022300816-A1
Application numberUS-202217655348-A
CountryUS
Kind codeA1
Filing dateMar 17, 2022
Priority dateMar 19, 2021
Publication dateSep 22, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural processing device and method for pruning thereof are provided. The neural processing device includes a processing unit configured to perform calculations, an L0 memory configured to store input and output data of the processing unit, wherein the input and output data include a two-dimensional weight matrix and a weight manipulator configured to receive the two-dimensional weight matrix and partition it into preset sizes to thereby generate partitioned matrices, to generate a pruning matrix by pruning the partitioned matrix, and to transmit the pruning matrix to the processing unit.

First claim

Opening claim text (preview).

What is claimed is: 1 . A neural processing device comprising: a processing unit configured to perform calculations; an L0 memory configured to store input and output data of the processing unit, wherein the input and output data include a two-dimensional weight matrix; and a weight manipulator configured to receive the two-dimensional weight matrix and partition it into preset sizes to thereby generate partitioned matrices, to generate a pruning matrix by pruning the partitioned matrix, and to transmit the pruning matrix to the processing unit. 2 . The neural processing device of claim 1 , wherein the processing unit comprises a PE array that performs two-dimensional calculations and that includes at least one processing element, and the processing element receives an input in SIMD (Single Instruction/Multiple Data). 3 . The neural processing device of claim 2 , wherein the size of the partitioned matrix is determined based on the width of the SIMD of the processing element. 4 . The neural processing device of claim 2 , wherein the weight manipulator comprises: a width identifier configured to check the width of the SIMD of the processing element and generate a confirmation signal; a weight initializer configured to receive the confirmation signal and initialize the two-dimensional weight matrix; a matrix divider configured to partition the two-dimensional weight matrix according to the width of the SIMD of the processing element and generate the partitioned matrices; and a pruner configured to generate the pruning matrix by pruning the partitioned matrix. 5 . The neural processing device of claim 4 , wherein the partitioned matrix contains at least one group, and the number of elements in the group is equal to the width of the SIMD of the processing element. 6 . The neural processing device of claim 5 , wherein the pruner is configured to: generate a representative value of the group, and compare the representative value with a threshold value and determine whether to convert the group into a zero group. 7 . The neural processing device of claim 6 , wherein the two-dimensional weight matrix contains weights as elements, and the weights and the threshold value are trained via an artificial neural network. 8 . The neural processing device of claim 6 , wherein the threshold value comprises an initial threshold value and an updated threshold value, and the pruner generates dry run information by performing pruning through the initial threshold value, and generates the pruning matrix through the updated threshold value. 9 . The neural processing device of claim 8 , wherein the weight manipulator further comprises a load balance unit configured to receive the dry run information and generate the updated threshold value. 10 . The neural processing device of claim 9 , wherein the load balance unit comprises: a zero group counter configured to receive the dry run information, to count zero groups, and to generate counting information; a threshold updater configured to receive the counting information and generate the updated threshold value. 11 . The neural processing device of claim 8 , wherein the updated threshold value comprises a partitioned matrix threshold value corresponding to each partitioned matrix. 12 . A method for pruning of a neural processing device, comprising: checking the width of SIMD of a processing element, initializing a two-dimensional weight matrix, partitioning the two-dimensional weight matrix into partitioned matrices based on the width of the SIMD, and pruning the partitioned matrix using a threshold value. 13 . The method for pruning of a neural processing device of claim 12 , wherein the partitioned matrix contains at least one group, and the group has a size equal to the width of the SIMD, and the pruning comprises: generating a representative value of the group, and changing the group to a zero group if the representative value is less than or equal to the threshold value. 14 . The method for pruning of a neural processing device of claim 13 , wherein the representative value comprises any one of a mean value, a minimum value, a maximum value, a median value, and a root mean square (RMS) value. 15 . The method for pruning of a neural processing device of claim 13 , wherein the two-dimensional weight matrix contains at least one weight, the weight and the threshold value are trained via an artificial neural network, and the threshold value is trained in a direction in which the number of zero groups is uniformly distributed for each partitioned matrix. 16 . The method for pruning of a neural processing device of claim 13 , wherein the threshold value comprises an initial threshold value and an updated threshold value, and the pruning comprises: generating dry run information by pruning with the initial threshold value, generating counting information by counting the zero groups based on the dry run information, generating the updated threshold value based on the counting information, and generating the pruning matrix by pruning based on the updated threshold value. 17 . The method for pruning of a neural processing device of claim 16 , wherein the generating the updated threshold value comprises: selecting a partitioned matrix having the largest number of zero groups in the counting information as a reference partitioned matrix, maintaining an initial threshold value of the reference partitioned matrix, and adjusting threshold values of the remaining partitioned matrices other than the reference partitioned matrix. 18 . The method for pruning of a neural processing device of claim 12 , wherein the pruning comprises generating a pruning matrix by pruning the partitioned matrix, and the pruning method of a neural processing device, further comprising: performing, by the processing element, calculations using the pruning matrix. 19 . The method for pruning of a neural processing device of claim 18 , wherein the pruning matrix contains at least one group, the at least one group comprises a zero group in which all elements are zero and a non-zero group in which at least one non-zero element is included, and the performing calculations comprises skipping the calculation of the zero group. 20 . The method for pruning of a neural processing device of claim 12 , wherein the two-dimensional weight matrix is a form obtained by rearranging a four-dimensional tensor in two dimensions.

Assignees

Inventors

Classifications

  • G06F9/3834Primary

    Maintaining memory consistency · CPC title

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • using electronic means · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022300816A1 cover?
A neural processing device and method for pruning thereof are provided. The neural processing device includes a processing unit configured to perform calculations, an L0 memory configured to store input and output data of the processing unit, wherein the input and output data include a two-dimensional weight matrix and a weight manipulator configured to receive the two-dimensional weight matrix…
Who is the assignee on this patent?
Rebellions Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/3834. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).