Neural network layer folding

US12561566B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12561566-B2
Application numberUS-202117399374-A
CountryUS
Kind codeB2
Filing dateAug 11, 2021
Priority dateApr 7, 2021
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure describes neural network reduction techniques for decreasing the number of neurons or layers in a neural network. Embodiments of the method, apparatus, non-transitory computer readable medium, and system are configured to receive a trained neural network and replace certain non-linear activation units with an identity function. Next, linear blocks may then be folded to form a single block in places where the non-linear activation units were replaced by an identity function. Such techniques may reduce the number of layers in the neural network, which may optimize power and computation efficiency of the neural network architecture (e.g., without unduly influencing the accuracy of the network model).

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining, through a cloud and by a neural network design apparatus, a neural network provided by a user device and including an affine function and a non-linear activation function; replacing, by the neural network design apparatus, the non-linear activation function with a parameterized activation function that includes a target affine function and a product of a linearity parameter and the non-linear activation function; iteratively adjusting, by the neural network design apparatus, the linearity parameter of the parameterized activation function to obtain an approximately affine activation function based on an auxiliary loss term that encourages the parameterized activation function to approach the target affine function; reducing, by the neural network design apparatus, the neural network by combining the approximately affine activation function with the affine function of the neural network based on the target affine function to obtain a reduced neural network; and sending, through the cloud and by the neural network design apparatus, the reduced neural network to the user device to allow a computing device to execute the reduced neural network. 2 . The method of claim 1 , wherein: the parameterized activation function includes the non-linear activation function, an additive inverse of a product of the linearity parameter and the non-linear activation function, and a product of the linearity parameter and a target affine function. 3 . The method of claim 1 , wherein: the parameterized activation function includes further comprises a product of an additional parameter and a target affine function. 4 . The method of claim 1 , wherein: the iteratively adjusting the linearity parameter comprises selecting a value for the linearity parameter, computing the auxiliary loss term based on the selected value, and updating the value for the linearity parameter based on the auxiliary loss term. 5 . The method of claim 1 , wherein: the auxiliary loss term encourages the linearity parameter to approach a value that causes the parameterized activation function to approach a target affine function. 6 . The method of claim 1 , wherein: the combining the approximately affine activation function with the affine function of the neural network comprises combining the approximately affine activation function with a first affine function before the approximately affine activation function and a second affine function after the approximately affine activation function. 7 . The method of claim 1 , wherein: the combining the approximately affine activation function with the affine function of the neural network comprises eliminating a skip connection of the neural network. 8 . The method of claim 1 , further comprising: replacing a plurality of non-linear activation functions with a plurality of parameterized activation functions having a same linearity parameter; and combining the plurality of non-linear activation functions with a plurality of affine functions to obtain the reduced neural network. 9 . The method of claim 8 , wherein: the plurality of non-linear activation functions is bypassed by a same skip connection. 10 . The method of claim 8 , wherein: the plurality of non-linear activation functions comprises a kernel boundary of a convolutional neural network. 11 . The method of claim 1 , further comprising: refining the reduced neural network based on a loss function that does not include the auxiliary loss term. 12 . The method of claim 1 , wherein: the non-linear activation function comprises one or more rectified linear unit (ReLU) blocks and the parameterized activation function comprises one or more parametric ReLU blocks. 13 . The method of claim 1 , wherein: the neural network comprises a convolutional neural network (CNN). 14 . The method of claim 13 , wherein: the reduced neural network comprises the CNN with a reduced number of layers. 15 . A method comprising: obtaining, through a cloud and by a neural network design apparatus, a neural network provided by a user device and including an affine function and a non-linear activation function; replacing, by the neural network design apparatus, the non-linear activation function with a parameterized activation function that includes target affine function and a product of a linearity parameter and the non-linear activation function; computing, by the neural network design apparatus, an auxiliary loss term based on a value selected for the linearity parameter of the parameterized activation function, wherein the auxiliary loss term encourages the parameterized activation function to approach the target affine function; iteratively updating, by the neural network design apparatus, the value for the linearity parameter of the parameterized activation function based on the auxiliary loss term to obtain an approximately affine activation function; combining, by the neural network design apparatus, the approximately affine activation function with the affine function of the neural network to obtain a reduced neural network; and sending, through the cloud and by the neural network design apparatus, the reduced neural network to the user device to allow a computing device to execute the reduced neural network. 16 . The method of claim 15 , further comprising: refining the reduced neural network based on a loss function that does not include the auxiliary loss term. 17 . A neural network design apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, cause the neural network design apparatus to: (a) obtain, through a cloud, a neural network provided by a user device, and including an affine function and a non-linear activation function; (b) modify the neural network by replacing the non-linear activation function with a parameterized activation function that includes a target affine function and a product of a linearity parameter and the non-linear activation function; (c) iteratively adjust the linearity parameter of the parameterized activation function to obtain an approximately affine activation function based on an auxiliary loss term that encourages the parameterized activation function to approach the target affine function; (d) combine the approximately affine activation function with the affine function of the neural network based on the target affine function to obtain a reduced neural network; and (e) send, through the cloud, the reduced neural network to the user device to allow a computing device to execute the reduced neural network. 18 . The neural network design apparatus of claim 17 , wherein: the instructions, when executed by the processor, further cause the neural network design apparatus to select a value for the linearity parameter, compute the auxiliary loss term based on the selected value, and update the value for the linearity parameter based on the auxiliary loss term. 19 . The neural network design apparatus of claim 17 , wherein: the instructions, when executed by the processor, further cause the neural network design apparatus to combine the approximately affine activation function with a first affine function before the approximately affine activation function and a second affine function after the approximately affine activation function. 20 . The neural network design apparatus of claim 17 , wherein: the instructions, when executed by the processor, furt

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • Activation functions · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12561566B2 cover?
The present disclosure describes neural network reduction techniques for decreasing the number of neurons or layers in a neural network. Embodiments of the method, apparatus, non-transitory computer readable medium, and system are configured to receive a trained neural network and replace certain non-linear activation units with an identity function. Next, linear blocks may then be folded to fo…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).