Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/0495. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Model quantization for software engineering tasks

US2023222334A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023222334-A1
Application number	US-202217572459-A
Country	US
Kind code	A1
Filing date	Jan 10, 2022
Priority date	Jan 10, 2022
Publication date	Jul 13, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep learning model is quantized during its training to perform a target software engineering task. During training, a portion of the full-precision floating point weights is quantized into INT4 or INT 8 data types through scalar quantization or product quantization to make the model more resilient to quantization and to reduce the noise between the quantized and full-precision model outputs. In scalar quantization, each sub-block consists of a single weight that is mapped into a codeword of a codebook. In product quantization, an identity matrix and a codebook of centroids is used to map a quantized weight into its original value.

First claim

Opening claim text (preview).

What is claimed: 1 . A system comprising: a processor; and a memory that stores a program configured to be executed by the processor, the program including instructions to perform acts that: obtain a deep learning model having a plurality of layers, each layer having a plurality of weight matrices; train the deep learning model to determine a value for each weight of each of the plurality of weight matrices that minimizes a loss function through application of training samples to each layer of the plurality of layers, wherein each weight matrix includes a first portion and a second portion, wherein the first portion of each weight matrix is quantized with reduced bit-width weights, wherein the second portion includes full-precision floating point values; and upon completion of the training of the deep learning model, quantize each weight matrix of the plurality of weight matrices with reduced bit-width weights. 2 . The system of claim 1 , wherein the program includes instructions to perform acts that: generate a codebook for each of the plurality of weight matrices, wherein the codebook includes a plurality of uniformly-distributed range of values. 3 . The system of claim 1 , wherein the program includes instructions to perform acts that: generate a codebook for each of the plurality of weight matrices, wherein the codebook includes a plurality of centroids, wherein each centroid of the plurality of centroids is generated from K-means clustering of weights of a respective weight matrix. 4 . The system of claim 3 , wherein the program includes instructions to perform acts that: generate an index matrix that maps a weight of a respective weight matrix into a select one of the centroids of the codebook. 5 . The system of claim 1 , wherein the program includes instructions to perform acts that: randomly select weights in the first portion of each weight matrix to quantized with reduced bit-widths. 6 . The system of claim 1 , wherein the reduced bit-width weights are fixed-point integers. 7 . The system of claim 1 , wherein the reduced bit-width weights are INT4 or INT8 data types. 8 . The system of claim 1 , wherein the deep learning model is a neural transformer model with attention. 9 . A computer-implemented method, comprising: obtaining a deep learning model having a plurality of layers, each layer having a plurality of weight matrices; training the deep learning model to learn values for each weight of the plurality of weight matrices that minimize a loss function by: selecting a first portion of each weight matrix at each layer to quantize; quantizing weights of the first portion of each weight matrix with fixed-point integer representations; performing computations at each layer with the fixed-point integer representations; computing an error loss from the computations; determining a full-precision gradient to update the quantized weights using an estimator; determining a full-precision gradient to update unquantized weights using stochastic gradient descent; and updating the values of the weights of each weight matrix based on the full-precision gradient; and upon completion of the training, quantizing each weight of each weight matrix into a fixed-point integer representation. 10 . The method of claim 9 , further comprising: decomposing each weight matrix into sub-blocks; and randomly choosing a select one of the sub-blocks as the first portion. 11 . The method of claim 9 , further comprising: generating a codebook for a first weight matrix, wherein the codebook includes a plurality of uniformly-distributed range of values based on an n-bit representation of the fixed-point integer representation; and mapping a weight of the first weight matrix into a value of the codebook. 12 . The method of claim 9 , further comprising: generating a codebook for a second weight matrix, wherein the codebook includes a plurality of centroids, wherein each centroid of the plurality of centroids is generated from K-means clustering of weights of the second weight matrix. 13 . The method of claim 12 , further comprising: generating an index matrix to map a weight of the second weight matrix into the select centroid of the codebook. 14 . The method of claim 9 , wherein the fixed-point integer representations are INT4 or INT8 data types. 15 . The method of claim 9 , wherein the deep learning model is a neural transformer model with attention. 16 . A device comprising: a processor and a memory; wherein the memory includes instructions that when executed on the processor performs actions that: configure a deep learning model with a plurality of layers, each of the plurality of layers having at least one weight matrix, the at least one weight matrix including a plurality of weights; train the deep learning model to learn to generate source code by computing values for each of the plurality of weights that minimizes an error function, wherein during training of the deep learning model: select a first portion of the at least one weight matrix to quantize with integer data types and selecting a second portion of the at least one weight matrix expressed as full-precision floating point values; determine values for weights of the at least one weight matrix through multiple iterations of a forward pass, backward pass, and weight update using the first portion of weights and the second portion of weights; and upon completion of the training, quantizing all weights of the at least one weight matrix to integer data types. 17 . The device of claim 16 , wherein the memory includes instructions that when executed on the processor performs actions that: generating a codebook for the at least one weight matrix, wherein the codebook includes a plurality of centroids; and computing the plurality of centroids for the at least one weight matrix using K-means clustering. 18 . The device of claim 17 , wherein the memory includes instructions that when executed on the processor performs actions that: generating an index matrix that maps a quantized weight of the at least one weight matrix into a centroid. 19 . The device of claim 16 , wherein the quantized weights are INT4 or INT8 data types. 20 . The device of claim 16 , wherein the deep learning model is a neural transformer model with attention.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0495Primary
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/063
using electronic means · CPC title
G06F18/211
Selection of the most significant subset of features · CPC title
G06F18/23213
with fixed number of clusters, e.g. K-means clustering · CPC title

Patent family

Related publications grouped by family.

View patent family 84367224

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023222334A1 cover?: A deep learning model is quantized during its training to perform a target software engineering task. During training, a portion of the full-precision floating point weights is quantized into INT4 or INT 8 data types through scalar quantization or product quantization to make the model more resilient to quantization and to reduce the noise between the quantized and full-precision model outputs.…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/0495. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).