Neural network architecture pruning
US-2021264278-A1 · Aug 26, 2021 · US
US12566960B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12566960-B2 |
| Application number | US-202217817662-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 5, 2022 |
| Priority date | Aug 5, 2022 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for compressing a machine learning model includes converting an input machine learning model into a standard machine learning model. The method further includes converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list. The method further includes determining, for each of the pruned machine learning models, a size-to-error ratio. The method further includes selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list. The method further includes generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected. The method further includes deploying the compressed machine learning model for production.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for compressing a machine learning model, the computer-implemented method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 2 . The computer-implemented method of claim 1 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 3 . The computer-implemented method of claim 1 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 4 . The computer-implemented method of claim 1 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 5 . The computer-implemented method of claim 1 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 6 . A system comprising: a memory; and a processor coupled to the memory, the processor configured to perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 7 . The system of claim 6 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 8 . The system of claim 6 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 9 . The system of claim 6 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 10 . The system of claim 6 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 11 . A computer program product comprising a non-transitory memory device with computer-executable instructions therein, the computer-executable instructions when executed by a processing unit perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 12 . The computer program product of claim 11 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 13 . The computer program product of claim 11 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 14 . The computer program product of claim 11 , wherein the compressed machine learning model has a smaller size than the input machine learning model.
Quantised networks; Sparse networks; Compressed networks · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Selection of the most significant subset of features · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.