Machine learning model compression

US2026099712A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026099712-A1
Application numberUS-202418905761-A
CountryUS
Kind codeA1
Filing dateOct 3, 2024
Priority dateOct 3, 2024
Publication dateApr 9, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described herein for a method of machine learning model compression. The method includes receiving a machine learning model comprising a plurality of blocks. The method further includes removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks. The method further includes adding a block to the intermediate machine learning model to obtain a compressed machine learning model. The block generates an output corresponding to an output of the removed one or more blocks of the plurality of blocks. The method further includes executing the compressed machine learning model on a low resource device.

First claim

Opening claim text (preview).

1 . A method comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks; and executing the compressed machine learning model on a low resource device, wherein the executing comprises: predicting, by the approximated block, a domain-specific region. 2 . The method of claim 1 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 3 . (canceled) 4 . The method of claim 1 , wherein the executing comprises: predicting, by the approximated block, a coordinate in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 5 . The method of claim 1 , wherein the executing comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 6 . The method of claim 1 , wherein the executing comprises: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, an approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token. 7 . The method of claim 1 , wherein the removed one or more blocks are two or more blocks of the plurality of blocks. 8 . A non-transitory computer-readable medium storing executable instructions, which when executed by a computing device, cause the computing device to perform operations comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks, and wherein the approximated block is configured to predict a domain-specific region; and providing the compressed machine learning model for execution by a low resource device. 9 . The non-transitory computer-readable medium of claim 8 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 10 . (canceled) 11 . The non-transitory computer-readable medium of claim 8 , wherein the approximated block is configured to predict a coordinate in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 12 . The non-transitory computer-readable medium of claim 8 , wherein execution of the compressed machine learning model on the low resource device further comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 13 . The non-transitory computer-readable medium of claim 8 , the operations further comprising: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, and approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token. 14 . The non-transitory computer-readable medium of claim 8 , wherein the removed one or more blocks are two or more blocks of the plurality of blocks. 15 . A system comprising: a computing device configured to perform operations comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; and adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks, and wherein the approximated block is configured to predict a domain-specific region; and a low resource device configured to receive and execute the compressed machine learning model. 16 . The system of claim 15 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 17 . (canceled) 18 . The system of claim 15 , wherein the approximated block is configured to predict a coordinate of the output in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 19 . The system of claim 15 , wherein execution of the compressed machine learning model on the low resource device comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 20 . The system of claim 15 , wherein execution of the compressed machine learning model on the low resource device comprises: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, an approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026099712A1 cover?
Techniques are described herein for a method of machine learning model compression. The method includes receiving a machine learning model comprising a plurality of blocks. The method further includes removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks. The method further includes adding a block t…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).