What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic compression of machine learning models

US12566960B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12566960-B2
Application number	US-202217817662-A
Country	US
Kind code	B2
Filing date	Aug 5, 2022
Priority date	Aug 5, 2022
Publication date	Mar 3, 2026
Grant date	Mar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for compressing a machine learning model includes converting an input machine learning model into a standard machine learning model. The method further includes converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list. The method further includes determining, for each of the pruned machine learning models, a size-to-error ratio. The method further includes selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list. The method further includes generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected. The method further includes deploying the compressed machine learning model for production.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for compressing a machine learning model, the computer-implemented method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 2 . The computer-implemented method of claim 1 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 3 . The computer-implemented method of claim 1 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 4 . The computer-implemented method of claim 1 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 5 . The computer-implemented method of claim 1 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 6 . A system comprising: a memory; and a processor coupled to the memory, the processor configured to perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 7 . The system of claim 6 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 8 . The system of claim 6 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 9 . The system of claim 6 , wherein the compressed machine learning model has a smaller size than the input machine learning model. 10 . The system of claim 6 , wherein the compressed machine learning model is generated by a first computing system and is deployed to a second computing system that executes the compressed machine learning model. 11 . A computer program product comprising a non-transitory memory device with computer-executable instructions therein, the computer-executable instructions when executed by a processing unit perform a method comprising: converting an input machine learning model by translating the input machine learning model, which may be structured using any machine learning framework, into a standard machine learning model that uses a predetermined framework, wherein the predetermined framework provides a common model definition format compatible across heterogeneous model sources; converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning ratio candidate list; determining, for each of the pruned machine learning models, a size-to-error ratio; selecting, based on the size-to-error ratio of the pruned machine learning models, a first pruning ratio from the pruning ratio candidate list, wherein the selection is based on a budget that comprises an available size for deploying the machine learning model and a latency for deploying the machine learning model; generating a compressed machine learning model by compressing the input machine learning model using the first pruning ratio that is selected; and deploying the compressed machine learning model for production to an execution system. 12 . The computer program product of claim 11 , wherein determining the size-to-error ratio for each of the pruned machine learning models comprises determining a size-to-error ratio for each layer of each of the pruned machine learning models, and aggregating the size-to-error ratio for each layer. 13 . The computer program product of claim 11 , wherein determining, for each of the pruned machine learning models, a size-to-error ratio comprises computing a model evaluation matrix that comprises an evaluation score for each of the pruned machine learning models. 14 . The computer program product of claim 11 , wherein the compressed machine learning model has a smaller size than the input machine learning model.

Assignees

Inventors

Classifications

G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title
G06F18/211
Selection of the most significant subset of features · CPC title
G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 89769186

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12566960B2 cover?: A computer-implemented method for compressing a machine learning model includes converting an input machine learning model into a standard machine learning model. The method further includes converting the standard machine learning model into a plurality of pruned machine learning models, each of the pruned machine learning models converted using a corresponding pruning ratio from a pruning rat…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural network architecture pruning

Machine learning model compression system, pruning method, and computer program product

Bayesian optimization of sparsity ratios in model compression

Device and method for compressing machine learning model

Frequently asked questions