Neural architecture search system and search method
US-2023385603-A1 · Nov 30, 2023 · US
US2023214629A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023214629-A1 |
| Application number | US-202117566375-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 30, 2021 |
| Priority date | Dec 30, 2021 |
| Publication date | Jul 6, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Generally discussed herein are devices, systems, and methods for improving architecture search and identification with constraints. A method can include receiving, at a compute device, a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency, identifying TBALM architectures that satisfies the maximum latency, identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture, and providing the identified TBALM architecture.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, at a compute device, a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 2 . The method of claim 1 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 3 . The method of claim 1 , further comprising using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 4 . The method of claim 3 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 5 . The method of claim 4 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 6 . The method of claim 3 , wherein: the compute device is a client device; and the method further comprises generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve. 7 . The method of claim 6 , wherein identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency and (ii) has a greatest number of decoder parameters for the architectures that satisfy the maximum latency includes selecting the TBALM corresponding to a point at a boundary of the generated pareto curve. 8 . A device comprising: processing circuitry; a memory including instructions that, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising: receiving a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 9 . The system of claim 8 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 10 . The system of claim 8 , wherein the operations further comprise using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 11 . The system of claim 10 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 12 . The system of claim 11 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 13 . The system of claim 10 , wherein: the processing circuitry is part of a client device; and the operations further comprise generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve. 14 . The system of claim 13 , wherein identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency and (ii) has a greatest number of decoder parameters for the architectures that satisfy the maximum latency includes selecting the TBALM corresponding to a point at a boundary of the generated pareto curve. 15 . A machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations comprising: receiving a request for a transformer-based autoregressive language model (TBALM), the request specifying a maximum latency; identifying TBALM architectures that satisfy the maximum latency; identifying a TBALM architecture of the identified TBALM architectures that has a greatest number of decoder parameters resulting in an identified TBALM architecture; and providing the identified TBALM architecture. 16 . The machine-readable medium of claim 15 , wherein: the request further specifies a maximum amount of memory consumed by the TBALM; and identifying the TBALM architecture includes identifying the TBALM architecture of the respective architectures that (i) satisfies the maximum latency, (ii) satisfies the maximum amount of memory consumed, and (iii) has a greatest number of decoder parameters for the architectures that satisfy both the maximum latency and maximum amount of memory consumed resulting in the identified TBALM architecture. 17 . The machine-readable medium of claim 15 , wherein the operations further comprise using a total number of decoder parameters of the architecture as a proxy for architecture accuracy. 18 . The machine-readable medium of claim 17 , wherein the total number of decoder parameters includes weights and biases of only the decoder of the TBALM architecture. 19 . The machine-readable medium of claim 18 , wherein the decoder parameters include, of the identified TBALM architecture: weights of attention heads; model dimensions; inner dimension of a feed forward network (FFN); and number of decoder layers. 20 . The machine-readable medium of claim 17 , wherein: the machine is a client device; and the operations further comprise generating, by the compute device, a pareto curve of number of decoder parameters versus latency for a variety of TBALM architectures, based on a processor of the compute device resulting in a generated pareto curve.
Architecture, e.g. interconnection topology · CPC title
Physics · mapped topic
Converting codes to words; Guess-ahead of partial word inputs · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.