System for reducing transaction failure
US-12175472-B2 · Dec 24, 2024 · US
US2025094864A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025094864-A1 |
| Application number | US-202418602951-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 12, 2024 |
| Priority date | Sep 14, 2023 |
| Publication date | Mar 20, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the size, computation, and latency of a machine learning model, a compression technique can be employed which includes model sparsification and quantization. To limit the extent to which the quality of the model is impacted when uniformly applying sparsification and quantization to all values of the model, the present disclosure provides for a hybrid sparsification and quantization of the model.
Opening claim text (preview).
What is claimed is: 1 . A method, comprising: at a device, compressing a machine learning model having a plurality of values to reduce at least one of a size of the machine learning model or computation requirements of the machine learning model, by: processing the machine learning model to generate a plurality of sparse data structures including: storing inlier values of the machine learning model in a first data structure, and storing outlier values of the machine learning model in a second data structure, wherein at least one of the first data structure or the second data structure has a structured sparse pattern; and non-uniformly quantizing the machine learning model, including: quantizing the first data structure storing the inlier values to a first bit width, and quantizing the second data structure storing the outlier values to a second bit width that is different from the first bit width. 2 . The method of claim 1 , wherein the inlier values and the outlier values are weights of the machine learning model. 3 . The method of claim 1 , wherein the inlier values and the outlier values are determined according to a defined threshold metric. 4 . The method of claim 1 , wherein the first data structure has a first structured sparse pattern that has less sparsity than a second structured sparse pattern of the second data structure. 5 . The method of claim 1 , wherein the first data structure has a first structured sparse pattern that is the same as a second structured sparse pattern of the second data structure. 6 . The method of claim 1 , wherein the first bit width and the second bit width are supported by different hardware accelerators. 7 . A method, comprising: at a device: apportioning a plurality of different subsets of values of the machine learning model into a plurality of data structures at least one of which has a defined structured sparse pattern; and changing a data representation of at least one data structure of the plurality of data structures, wherein at least two data structures of the plurality of data structures have different data representations. 8 . The method of claim 7 , wherein the machine learning model is a deep neural network. 9 . The method of claim 7 , wherein the machine learning model is a large language model (LLM). 10 . The method of claim 7 , wherein the values of the machine learning model are weights of the machine learning model. 11 . The method of claim 7 , wherein the plurality of data structures are tensors. 12 . The method of claim 7 , wherein at least two of the plurality of data structures have different defined structured sparse patterns. 13 . The method of claim 12 , wherein the different defined structured sparse patterns include at least: a first defined structured sparse pattern having a first sparsity degree, and a second defined structured sparse pattern having a second sparsity degree, wherein the first sparsity degree is different from the second sparsity degree. 14 . The method of claim 7 , wherein the plurality of different subsets of values of the machine learning model include: at least one subset comprised of at least a portion of inlier values of the machine learning model, and at least another subset comprised of at least a portion of outlier values of the machine learning model. 15 . The method of claim 14 , wherein the inlier values and the outlier values are determined according to a defined threshold metric. 16 . The method of claim 15 , wherein the defined threshold metric is a magnitude of weight. 17 . The method of claim 15 , wherein the defined threshold metric is an error after quantization for inlier and outlier. 18 . The method of claim 15 , wherein the defined threshold metric is a product of a corresponding weight and activation. 19 . The method of claim 14 , wherein at least a portion of the inlier values are stored with a first structured sparse pattern that has less sparsity than a second structured sparse pattern used to store at least a portion of the outlier values. 20 . The method of claim 7 , wherein the machine learning model is further compressed by: sparsifying the machine learning model by pruning values from the machine learning model, to form a sparse machine learning model, wherein the plurality of different subsets of values of the machine learning model are determined from the sparse machine learning model. 21 . The method of claim 20 , wherein the values of the machine learning model are selected for pruning according to a defined threshold metric. 22 . The method of claim 21 , wherein the defined threshold metric is a magnitude of weight. 23 . The method of claim 21 , wherein the defined threshold metric is an error after pruning. 24 . The method of claim 21 , wherein the defined threshold metric is a product of a corresponding weight and activation obtained with training or validation data. 25 . The method of claim 20 , wherein the machine learning model is sparsified to a defined degree of sparsity. 26 . The method of claim 20 , wherein the machine learning model is sparsified with a defined structured sparse pattern. 27 . The method of claim 7 , wherein changing the data representation of the at least one data structure includes quantizing the at least one data structure. 28 . The method of claim 7 , wherein the data representation of the plurality of data structures includes a bit width of the plurality of data structures. 29 . The method of claim 7 , wherein the data representation of the plurality of data structures includes a data type of the plurality of data structures. 30 . The method of claim 7 , wherein the different data representations are supported by a single hardware accelerator or multiple different hardware accelerators. 31 . The method of claim 7 , wherein at least two of the plurality of data structures have different defined structured sparse patterns, and wherein the different defined structured sparse patterns are supported by a single hardware accelerator or multiple different hardware accelerators. 32 . A system, comprising: a non-transitory memory storage comprising instructions; and one or more processors in communication with the memory, wherein the one or more processors execute the instructions to at least one of compress a machine learning model or reduce a computation of the machine learning model by: apportioning a plurality of different subsets of values of the machine learning model into a plurality of data structures at least one of which has a defined structured sparse pattern; and changing a data representation of at least one data structure of the plurality of data structures, wherein at least two data structures of the plurality of data structures have different data representations. 33 . The system of claim 32 , wherein the machine learning model is a deep neural network. 34 . The system of claim 32 , wherein the machine learning model is a large language model (LLM). 35 . The system of claim 32 , wherein the values of the machine learning model are weights of the machine learning model. 36 . The system of claim 32 , wherein at least two of the plurality of d
Related publications grouped by family.
Answers are generated from the same data shown on this page.