What technology area does this patent fall under?

Primary CPC classification G06F7/483. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Memory efficient neural networks

US2020082269A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020082269-A1
Application number	US-201916373447-A
Country	US
Kind code	A1
Filing date	Apr 2, 2019
Priority date	Sep 12, 2018
Publication date	Mar 12, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of a method includes performing one or more activation functions in a neural network using weights that have been quantized from floating point values to values that are represented using fewer bits than the floating point values. The method further includes performing a first quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a first number of forward-backward passes of the neural network using training data. The method further includes performing a second quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a second number of forward-backward passes of the neural network following the first quantization of the weights.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor comprising: one or more arithmetic logic units (ALUs) to perform one or more activation functions in a neural network using weights that have been converted from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 2 . The processor of claim 1 , wherein the one or more ALUs further perform one or more activation functions in the neural network by applying the weights to activation inputs that have been converted from the first floating point value representation to the second floating point value representation. 3 . The processor of claim 1 , wherein the weights are converted by: performing a first quantization of the weights from the first floating point value representation to the second floating point value representation after the weights are updated using a first number of forward-backward passes of training the neural network; and performing a second quantization of the weights from the first floating point value representation to the second floating point value representation after the weights are updated using a second number of forward-backward passes of training the neural network following the first quantization of the weight. 4 . The processor of claim 3 , wherein the first number of forward-backward passes is determined based on an offset hyperparameter associated with training the neural network. 5 . The processor of claim 3 , wherein the second number of forward-backward passes is determined based on a frequency hyperparameter associated with training the neural network. 6 . The processor of claim 1 , wherein the weights are converted by: freezing a first portion of the weights in a first one or more layers of the neural network; and modifying a second portion of the weights in a second one or more layers of the neural network. 7 . The processor of claim 6 , wherein an output of the first one or more layers is quantized prior to modifying the second portion of the weights in the second one or more layers. 8 . The processor of claim 6 , wherein the weights are converted by: after the second portion of the weights is modified, freezing the second portion of the weights in the second one or more layers of the neural network; and modifying a third portion of the weights in a third one or more layers of the neural network following the second one or more layers. 9 . The processor of claim 6 , wherein modifying the second portion of the weights comprises: updating the floating point values in the second portion of the weights based at least on an output of the first one or more layers; and converting the second portion of the weights from the first floating point value representation to the second floating point value representation. 10 . A method, comprising: training one or more neural networks, wherein training the one or more neural networks includes converting weight parameters from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 11 . The method of claim 10 , wherein converting the weight parameters comprises: performing a first quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a first number of forward-backward passes of training the one or more neural networks; and performing a second quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a second number of forward-backward passes of training the one or more neural networks following the first quantization of the weight parameters. 12 . The method of claim 11 , further comprising: determining the first number of forward-backward passes based on an offset hyperparameter associated with the training of the one or more neural networks. 13 . The method of claim 11 , further comprising: determining the second number of forward-backward passes based on a frequency hyperparameter associated with the training of the one or more neural networks. 14 . The method of claim 10 , wherein converting the weight parameters comprises: freezing a first portion of the weight parameters in a first one or more layers of the one or more neural networks; and modifying a second portion of the weight parameters in a second one or more layers of the one or more neural networks that follow the first one or more layers. 15 . The method of claim 14 , further comprising quantizing an output of the first one or more layers prior to modifying the second portion of the weight parameters in the second one or more layers. 16 . The method of claim 14 , further comprising: after the second portion of the weight parameters is modified, freezing the second portion of the weight parameters in the second one or more layers of the one or more neural networks; and modifying a third portion of the weight parameters in a third one or more layers of the one or more neural networks that follow the second one or more layers. 17 . The method of claim 14 , wherein modifying the second portion of the weight parameters comprises: updating the floating point values in the second portion of the weight parameters based at least on an output of the first one or more layers; and converting the second portion of the weight parameters from the first floating point value representation to the second floating point value representation. 18 . The method of claim 14 , wherein the first one or more layers of the neural network comprise a convolutional layer, a batch normalization layer, and an activation layer. 19 . The method of claim 10 , wherein the weight parameters are associated with a fully connected layer in the neural network. 20 . A system comprising: one or more computers including one or more processors to train one or more neural networks, wherein training the one or more neural networks includes converting weight parameters from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 21 . The system of claim 20 , wherein converting the weight parameters comprises: performing a first quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a first number of forward-backward passes of training the one or more neural networks; and performing a second quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a second number of forward-backward passes of training the one or more neural networks following the first quantization of the weight parameters. 22 . The system of claim 21 , wherein the first number of forward-backward passes is based on an offset hyperparameter associated with the training of the one or more neural networks. 23 . The system of claim 21 , wherein the second number of forward-backward passes is based on a frequency hyperparameter associated with the training of the one or more n

Assignees

Nvidia Corp

Inventors

Classifications

G06F2207/4824
Neural networks · CPC title
G06F7/483Primary
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06N5/04
Inference or reasoning models · CPC title
G06F7/57
Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title

Patent family

Related publications grouped by family.

View patent family 69718803

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020082269A1 cover?: One embodiment of a method includes performing one or more activation functions in a neural network using weights that have been quantized from floating point values to values that are represented using fewer bits than the floating point values. The method further includes performing a first quantization of the weights from the floating point values to the values that are represented using fewe…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).