Method and system for compressing application data for operations on multi-core systems

US11599367B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11599367-B2
Application numberUS-202016752239-A
CountryUS
Kind codeB2
Filing dateJan 24, 2020
Priority dateJan 24, 2020
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compressed weight matrix is formed by matrix factorization and quantization of a floating point value of each weight to a floating point format. A decompression module is operable to obtain an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. A plurality of cores executes the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output.

First claim

Opening claim text (preview).

What is claimed is: 1. A multi-core system for executing at least one layer of a convolutional neural network, the system comprising: a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix, wherein the compressed weight matrix is formed by matrix factorization and quantization of a floating point format of each weight to a floating point composite; a decompression module operable to obtain an approximation of the weight values by decompressing the compressed weight matrix by expanding the floating point composite of each weight to the floating point format; and a plurality of cores executing the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output. 2. The system of claim 1 , wherein the compressed weight matrix is formed by matrix factorization and wherein a decompression matrix is stored in the storage device, wherein the decompression module is operable to obtain the approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. 3. The system of claim 2 , wherein the matrix factorization is performed by singular value decomposition. 4. The system of claim 2 , wherein the matrix factorization is performed by principal components analysis. 5. The system of claim 1 , wherein compressed weight matrix is encoded prior to being stored in the storage device. 6. The system of claim 5 , wherein the encoding is a Hoffman encoding procedure. 7. The system of claim 1 , wherein the floating point format is a 32-bit single precision floating point number in accordance with the IEE754 standard and the floating point composite is a four-bit value. 8. The system of claim 1 , wherein the convolutional neural network is an image classification model including image based inputs and wherein the interference output is a classification of the image. 9. The system of claim 1 , wherein the storage device includes a second compressed weight matrix of a second set of weights associated with a second layer of the convolutional neural network, wherein the decompression module is operable to obtain a second approximation of the second set of weights, and wherein the plurality of cores executes the second layer of the convolutional neural network with the second approximation of weight values. 10. A method of compression of a set of weights for a layer of a convolutional neural network, the method comprising: performing matrix factorization of the set of weights to produce a decompression matrix and a set of factorized weights for the quantization; compressing the set of weights by quantization of a floating point format of each weight to a floating point composite; and storing the decompression matrix and the compressed set of weights in a storage device of a multi-core device configured to execute the convolutional neural network. 11. The method of claim 10 , wherein the matrix factorization is performed by singular value decomposition. 12. The method of claim 10 , wherein the matrix factorization is performed by principal components analysis. 13. The method of claim 10 , further comprising: determining an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix; and executing the layer of the convolutional neural network by a plurality of cores of the multi-core device with the approximation of weight values to produce an inference output. 14. The method of claim 10 , further comprising encoding the compressed weight matrix. 15. The method of claim 14 , wherein the encoding is a Hoffman encoding procedure. 16. The method of claim 10 , wherein the floating point format is a 32-bit single precision floating point number in accordance with the IEE754 standard and the floating point composite is a four-bit value. 17. The method of claim 10 , wherein the convolutional neural network is an image classification model including image based inputs and wherein the interference output is a classification of the image. 18. The method of claim 10 , further comprising: performing matrix factorization of a second set of weights associated with a second layer of the convolutional neural network to produce a second decompression matrix and a second set of factorized weights; compressing the second set of factorized weights by quantization of a floating point value of each weight to a floating point format; and storing the second compressed set of factorized weights and the second decompression matrix in a storage device of a multi-core device configured to execute the second layer of the convolutional neural network. 19. A method of image classification comprising: performing matrix factorization of a set of weights of a layer of a convolutional neural network image classification model to produce a decompression matrix and a set of factorized weights for the quantization; compressing the set of weights of by quantization of a floating point format of each weight to a floating point composite; storing the compressed set of weights and decompression matrix in a storage device of a multi-core device; determining an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix and expanding the floating point composite of each of the weight values; inputting features of an unknown image to the convolutional neural network image classification model; and executing the layer of the convolutional neural network model by a plurality of cores of the multi-core device with the approximation of weight values to produce an inference output classifying the unknown image.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Neural networks · CPC title

  • Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599367B2 cover?
A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compr…
Who is the assignee on this patent?
Cornami Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/445. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).