What technology area does this patent fall under?

Primary CPC classification G06T9/002. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network model compression with selective structured weight unification

US2021217204A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2021217204-A1
Application number	US-202017086642-A
Country	US
Kind code	A1
Filing date	Nov 2, 2020
Priority date	Jan 10, 2020
Publication date	Jul 15, 2021
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program, or computer system is provided for compressing a neural network model. One or more blocks are identified from among a superblock corresponding to a multi-dimensional tensor associated with a neural network. A set of weight coefficients associated with the superblock is unified. A model of the neural network is compressed based on the unified set of weight coefficients.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for compressing a neural network model, executable by a processor, comprising: identifying one or more blocks from among a superblock corresponding to a multi-dimensional tensor associated with a neural network; unifying a set of weight coefficients associated with the superblock; and compressing a model of the neural network based on the unified set of weight coefficients. 2 . The method of claim 1 , wherein unifying the set of weight coefficients comprises: quantizing the weight coefficients; and selecting the subset of weight coefficients based on minimizing a unification loss value associated with the weight coefficients. 3 . The method of claim 2 , further comprising training the deep neural network based on back-propagating the minimized unification loss value. 4 . The method of claim 2 , wherein one or more weight coefficients from among the subset of weight coefficients are fixed to one or more values based on back-propagating the minimized unification loss value. 5 . The method of claim 4 , further comprising updating one or more non-fixed weight coefficients from among the subset of weight coefficients based on determining a gradient and a unifying mask associated with the set of weight coefficients. 6 . The method of claim 1 , further comprising compressing the set of weight coefficients by quantizing and entropy-coding the subset of weight coefficients. 7 . The method of claim 1 , wherein the unified set of weight coefficients comprises one or more weight coefficients having a same absolute value. 8 . A computer system for compressing a neural network model, the computer system comprising: one or more computer-readable non-transitory storage media configured to store computer program code; and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code, said computer program code including: identifying code configured to cause the one or more computer processors to identify one or more blocks from among a superblock corresponding to a multi-dimensional tensor associated with a neural network; unifying code configured to cause the one or more computer processors to unify a set of weight coefficients associated with the superblock; and compressing code configured to cause the one or more computer processors to compress a model of the neural network based on the unified set of weight coefficients. 9 . The computer system of claim 8 , wherein the unifying code comprises: quantizing code configured to cause the one or more computer processors to quantize the weight coefficients; and selecting code configured to cause the one or more computer processors to select the subset of weight coefficients based on minimizing a unification loss value associated with the weight coefficients. 10 . The computer system of claim 9 , further comprising training code configured to cause the one or more computer processors to train the deep neural network based on back-propagating the minimized unification loss value. 11 . The computer system of claim 9 , wherein one or more weight coefficients from among the subset of weight coefficients are fixed to one or more values based on back-propagating the minimized unification loss value. 12 . The computer system of claim 11 , further comprising updating code configured to cause the one or more computer processors to update one or more non-fixed weight coefficients from among the subset of weight coefficients based on determining a gradient and a unifying mask associated with the set of weight coefficients. 13 . The computer system of claim 8 , further comprising compressing code configured to cause the one or more computer processors to compress the set of weight coefficients by quantizing and entropy-coding the subset of weight coefficients. 14 . The computer system of claim 8 , wherein the unified set of weight coefficients comprises one or more weight coefficients having a same absolute value. 15 . A non-transitory computer readable medium having stored thereon a computer program for compressing a neural network model, the computer program configured to cause one or more computer processors to: identify one or more blocks from among a superblock corresponding to a multi-dimensional tensor associated with a neural network; unify a set of weight coefficients associated with the superblock; and compress a model of the neural network based on the unified set of weight coefficients 16 . The computer readable medium of claim 15 , wherein the unifying code comprises: quantizing code configured to cause the one or more computer processors to quantize the weight coefficients; and selecting code configured to cause the one or more computer processors to select the subset of weight coefficients based on minimizing a unification loss value associated with the weight coefficients. 17 . The computer readable medium of claim 16 , further comprising training code configured to cause the one or more computer processors to train the deep neural network based on back-propagating the minimized unification loss value. 18 . The computer readable medium of claim 16 , wherein one or more weight coefficients from among the subset of weight coefficients are fixed to one or more values based on back-propagating the minimized unification loss value. 19 . The computer readable medium of claim 18 , further comprising updating code configured to cause the one or more computer processors to update one or more non-fixed weight coefficients from among the subset of weight coefficients based on determining a gradient and a unifying mask associated with the set of weight coefficients. 20 . The computer readable medium of claim 15 , further comprising compressing code configured to cause the one or more computer processors to compress the set of weight coefficients by quantizing and entropy-coding the subset of weight coefficients.

Assignees

Tencent America LLC

Inventors

Classifications

G06N3/082
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/063
using electronic means · CPC title
G06T9/002Primary
using neural networks · CPC title
H04N19/176Primary
the region being a block, e.g. a macroblock · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 76763704

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021217204A1 cover?: A method, computer program, or computer system is provided for compressing a neural network model. One or more blocks are identified from among a superblock corresponding to a multi-dimensional tensor associated with a neural network. A set of weight coefficients associated with the superblock is unified. A model of the neural network is compressed based on the unified set of weight coefficients.
Who is the assignee on this patent?: Tencent America LLC
What technology area does this patent fall under?: Primary CPC classification G06T9/002. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).