Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US2026099712A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026099712-A1 |
| Application number | US-202418905761-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 3, 2024 |
| Priority date | Oct 3, 2024 |
| Publication date | Apr 9, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described herein for a method of machine learning model compression. The method includes receiving a machine learning model comprising a plurality of blocks. The method further includes removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks. The method further includes adding a block to the intermediate machine learning model to obtain a compressed machine learning model. The block generates an output corresponding to an output of the removed one or more blocks of the plurality of blocks. The method further includes executing the compressed machine learning model on a low resource device.
Opening claim text (preview).
1 . A method comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks; and executing the compressed machine learning model on a low resource device, wherein the executing comprises: predicting, by the approximated block, a domain-specific region. 2 . The method of claim 1 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 3 . (canceled) 4 . The method of claim 1 , wherein the executing comprises: predicting, by the approximated block, a coordinate in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 5 . The method of claim 1 , wherein the executing comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 6 . The method of claim 1 , wherein the executing comprises: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, an approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token. 7 . The method of claim 1 , wherein the removed one or more blocks are two or more blocks of the plurality of blocks. 8 . A non-transitory computer-readable medium storing executable instructions, which when executed by a computing device, cause the computing device to perform operations comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks, and wherein the approximated block is configured to predict a domain-specific region; and providing the compressed machine learning model for execution by a low resource device. 9 . The non-transitory computer-readable medium of claim 8 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 10 . (canceled) 11 . The non-transitory computer-readable medium of claim 8 , wherein the approximated block is configured to predict a coordinate in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 12 . The non-transitory computer-readable medium of claim 8 , wherein execution of the compressed machine learning model on the low resource device further comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 13 . The non-transitory computer-readable medium of claim 8 , the operations further comprising: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, and approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token. 14 . The non-transitory computer-readable medium of claim 8 , wherein the removed one or more blocks are two or more blocks of the plurality of blocks. 15 . A system comprising: a computing device configured to perform operations comprising: receiving a machine learning model comprising a plurality of blocks that each comprise a respective stack of sequential layers configured to perform a task in the machine learning model; removing one or more blocks of the plurality of blocks to obtain an intermediate machine learning model comprising a subset of the plurality of blocks; and adding an approximated block to the intermediate machine learning model to obtain a compressed machine learning model, wherein the approximated block is configured to approximate operation of the removed one or more blocks of the plurality of blocks, and wherein the approximated block is configured to predict a domain-specific region; and a low resource device configured to receive and execute the compressed machine learning model. 16 . The system of claim 15 , wherein the removed one or more blocks are sequential blocks of the plurality of blocks. 17 . (canceled) 18 . The system of claim 15 , wherein the approximated block is configured to predict a coordinate of the output in an embedding space, wherein the coordinate is based on a sequence of inputs of the machine learning model. 19 . The system of claim 15 , wherein execution of the compressed machine learning model on the low resource device comprises: providing, to the compressed machine learning model, a domain-specific query; and generating, by the compressed machine learning model, domain-specific natural language text responsive to the domain-specific query. 20 . The system of claim 15 , wherein execution of the compressed machine learning model on the low resource device comprises: receiving, by the approximated block, an embedding of a token; generating, by the approximated block, an approximated output that approximates an output that would have been generated by the removed one or more blocks of the plurality of blocks, wherein the approximated output is an embedding of a next token.
Quantised networks; Sparse networks; Compressed networks · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.