Automatic compression of machine learning models

US12566960B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12566960-B2
Application numberUS-202217817662-A
CountryUS
Kind codeB2
Filing dateAug 5, 2022
Priority dateAug 5, 2022
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for compressing a machine learning model includes converting an input machine learning model into a standard machine learning model. The method further includes converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list. The method further includes determining, for each of the pruned machine learning models, a size-to-error ratio. The method further includes selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list. The method further includes generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected. The method further includes deploying the compressed machine learning model for production.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for compressing a machine learning model, the computer-implemented method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 2 . The computer-implemented method of claim 1 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 3 . The computer-implemented method of claim 1 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 4 . The computer-implemented method of claim 1 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 5 . The computer-implemented method of claim 1 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 6 . A system comprising: a memory; and a processor coupled to the memory, the processor configured to perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 7 . The system of claim 6 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 8 . The system of claim 6 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 9 . The system of claim 6 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 10 . The system of claim 6 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 11 . A computer program product comprising a non-transitory memory device with computer-executable instructions therein, the computer-executable instructions when executed by a processing unit perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 12 . The computer program product of claim 11 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 13 . The computer program product of claim 11 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 14 . The computer program product of claim 11 , wherein the compressed machine learning model has a smaller size than the input machine learning model.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Selection of the most significant subset of features · CPC title

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12566960B2 cover?
A computer-implemented method for compressing a machine learning model includes converting an input machine learning model into a standard machine learning model. The method further includes converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning rat…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).