Bit width selection for fixed point neural networks

US10262259B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10262259-B2
Application numberUS-201514936594-A
CountryUS
Kind codeB2
Filing dateNov 9, 2015
Priority dateMay 8, 2015
Publication dateApr 16, 2019
Grant dateApr 16, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for selecting bit widths for a fixed point machine learning model includes evaluating a sensitivity of model accuracy to bit widths at each computational stage of the model. The method also includes selecting a bit width for parameters, and/or intermediate calculations in the computational stages of the mode. The bit width for the parameters and the bit width for the intermediate calculations may be different. The selected bit width may be determined based on the sensitivity evaluation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for selecting bit widths for values of a fixed point machine learning model stored in a memory of a neural computing device, comprising: applying, to an input received at the neural computing device, the model to classify the input; evaluating, while applying the model to the input, an amount of system resources for the neural computing device and a sensitivity of model accuracy to bit widths at a computational stage of the model; dynamically selecting a new bit width for values corresponding to one or more of parameters and intermediate calculations in the computational stage of the model based at least in part on at least one of the amount of system resources, the model accuracy, or a combination thereof being less than a threshold; and applying the model to classify the input with the new bit width. 2. The method of claim 1 , in which the model accuracy comprises a signal quantization to noise ratio (SQNR) at an output of the model or classification accuracy. 3. The method of claim 1 , in which: the model comprises a neural network and the computational stage is a layer of the neural network; the parameters comprise one or more of bias values and weights; and the intermediate calculations comprise activation values. 4. The method of claim 3 , in which the new bit width is based at least in part on connectivity of the network. 5. The method of claim 4 , in which the connectivity comprises a fully connected configuration, a convolutional configuration, or a configuration with a specific sparsity. 6. The method of claim 5 , in which a bit width for a fully connected layer is less than a bit width for a convolutional layer of the neural network. 7. The method of claim 6 , in which the weights and/or the bias values of the fully connected layer and the convolutional layer are random in a transfer learning arrangement. 8. The method of claim 3 , in which selecting of the new bit width is based at least in part on whether the new bit width is for a bias value, weight, or activation value. 9. The method of claim 3 , in which the new bit width for one or more of the bias values, the weights, and the activation values is based at least in part on a number of weights per layer, a number of activation values per layer, filter size per layer, filter stride per layer, and number of filters per layer in the neural network. 10. The method of claim 3 , further comprising fine-tuning the network after selecting one or more of the new bit width for the bias values, the activation values, and the weights of each layer. 11. The method of claim 1 , in which a bit width for the intermediate calculations of the computational stage is less than a bit width for the parameters in the computational stage. 12. The method of claim 1 , further comprising: injecting noise into the computational stage of the model; determining a model accuracy for the computational stage of the injected noise; and selecting a level of injected noise that provides a desired level of model accuracy. 13. The method of claim 1 , further comprising dynamically selecting the new bit width based at least in part on performance specifications or user input. 14. The method of claim 1 , in which an output layer uses a floating point number format. 15. A neural computing device for selecting bit widths for values of a fixed point machine learning model, comprising: a memory; and at least one processor coupled to the memory, the at least one processor configured: to apply, to an input received at the neural computing device, the model to classify the input; to evaluate, while applying the model to the input, an amount of system resources for the neural computing device and a sensitivity of model accuracy to bit widths at a computational stage of the model; to dynamically select a new bit width for values corresponding to one or more of parameters and intermediate calculations in the computational stage of the model based at least in part on at least one of the amount of system resources, the model accuracy, or a combination thereof being less than a threshold, the values stored in the memory; and to apply the model to classify the input with the new bit width. 16. The neural computing device of claim 15 , in which the model accuracy comprises a signal quantization to noise ratio (SQNR) at an output of the model or classification accuracy. 17. The neural computing device of claim 15 , in which: the model comprises a neural network and the computational stage is a layer of the neural network; the parameters comprise one or more of bias values and weights; and the intermediate calculations comprise activation values. 18. The neural computing device of claim 17 , in which the at least one processor is further configured to select the new bit width based at least in part on connectivity of the network. 19. The neural computing device of claim 18 , in which the connectivity comprises a fully connected configuration, a convolutional configuration or a configuration with a specific sparsity. 20. The neural computing device of claim 19 , in which a bit width for a fully connected layer is less than a bit width for a convolutional layer of the neural network. 21. The neural computing device of claim 20 , in which one or more of the weights or the bias values of the fully connected layer and the convolutional layer are random in a transfer learning arrangement. 22. The neural computing device of claim 17 , in which the at least one processor is further configured to select the new bit width based at least in part on whether the new bit width is for a bias value, weight, or activation value. 23. The neural computing device of claim 17 , in which the at least one processor is further configured to select the new bit width for one or more of the bias values, the weights, and the activation values based at least in part on a number of weights per layer, a number of activation values per layer, filter size per layer, filter stride per layer, and number of filters per layer in the neural network. 24. The neural computing device of claim 17 , in which the at least one processor is further configured to fine-tune the network after selecting one or more of the new bit width for the bias values, the activation values, and the weights of each layer. 25. The neural computing device of claim 15 , in which the at least one processor is further configured to select a bit width for the intermediate calculations of the computational stage to be less than a bit width for the parameters in the computational stage. 26. The neural computing device of claim 15 , in which the at least one processor is further configured: to inject noise into the computational stage of the model; to determine a model accuracy for the computational stage of the injected noise; and to select a level of injected noise that provides a desired level of model accuracy. 27. The neural computing device of claim 15 , in which the at least one processor is further configured to dynamically select the new bit width based at least in part on performance specifications or user input. 28. The neural computing device of claim 15 , in which an output layer of the model uses a floating point number format. 29. An apparatus for selecting bit widths for values of a fixed point machine

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • for solving equations {, e.g. nonlinear equations, general mathematical optimization problems (optimization specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10262259B2 cover?
A method for selecting bit widths for a fixed point machine learning model includes evaluating a sensitivity of model accuracy to bit widths at each computational stage of the model. The method also includes selecting a bit width for parameters, and/or intermediate calculations in the computational stages of the mode. The bit width for the parameters and the bit width for the intermediate calcu…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 16 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).