Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US2025156684A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025156684-A1 |
| Application number | US-202418587052-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 26, 2024 |
| Priority date | Nov 10, 2023 |
| Publication date | May 15, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An adapter to a base model of an artificial intelligence (AI) system is disclosed. The adapter includes a connector to connect the adapter to the base model such that during an operation of the AI system at least some portion of data transformed by the base model is propagated from the base model to the adapter and back from the adapter to the base model. The adapter includes a non-linear modifier to modify the data received from the base model non-linearly before returning the modified portion of the data back to the base model, and an AI trainer to tune the non-linear modifier of the adapter by propagating training data through the base model and the adapter and updating weights of the non-linear modifier of the adapter for given weights of the base model to optimize a loss function. Further, weight matrices for the base model and the adapter are jointly constructed by an additional module, which efficiently uses a pool of parameters to allocate to save memory requirement for adaptation of the AI system.
Opening claim text (preview).
What is claimed is: 1 . An adapter to a base model of an artificial intelligence (AI) system, the adapter comprising: a connector configured to connect the adapter to the base model such that during an operation of the AI system at least some portion of data transformed by the base model is propagated from the base model to the adapter and back from the adapter to the base model; a non-linear modifier configured to modify the data received from the base model non-linearly before returning the modified portion of the data back to the base model; and an AI trainer configured to tune the non-linear modifier of the adapter by propagating training data through the base model and the adapter and updating weights of the non-linear modifier of the adapter for given weights of the base model to optimize a loss function. 2 . The adapter of claim 1 , wherein the non-linear modifier includes multiple paths formed by multiple AI architectures of data transformation, each of the paths is either a linear path configured to modify the received data linearly or a non-linear path configured to modify the received data non-linearly, wherein an AI architecture of the linear path modifies the received data linearly using one or multiple weight matrices, wherein an AI architecture of the non-linear path modifies the received data linearly using one or multiple weight matrices and modifies the received data non-linearly using one or multiple non-linear functions, and wherein the non-linear modifier includes at least one non-linear path. 3 . The adapter of claim 2 , wherein the non-linear modifier includes multiple non-linear paths using different non-linear functions, different arrangements of the same non-linear functions with respect to the weight matrices, or both. 4 . The adapter of claim 3 , wherein the non-linear modifier includes at least one linear path. 5 . The adapter of claim 3 , wherein the multiple non-linear paths include the same weight matrices. 6 . The adapter of claim 3 , wherein the multiple non-linear paths share at least some weights. 7 . The adapter of claim 3 , wherein weights in the weight matrices of the multiple non-linear paths come from a common pool of parameters, such that to tune the non-linear modifier, the AI trainer updates the common pool of parameters. 8 . The adapter of claim 2 , wherein the non-linear modifier comprises: a path splitter configured to direct the received data to each of the paths; and a path combiner configured to combine outputs of each of the paths to submit a combined output back to the base model. 9 . The adapter of claim 8 , wherein the path combiner combines the outputs using an operation including one or a combination of: an identity, a duplication, a permutation, a polynomial basis expansion, a Fourier basis expansion, an addition, a multiplication, a division, a subtraction, a modulo-addition, a modulo-product, a Kronecker product, a Kronecker sum, a Hadamard product, a concatenation, a log-sum-exp, an affine transform, a convolution, randomization, a normalization, a nonlinear activation operation, and variants thereof. 10 . The adapter of claim 9 , wherein the operation of the path combiner includes a parameter learned during the tuning of the AI trainer. 11 . The adapter of claim 2 , wherein the AI architecture of the non-linear path includes a bottleneck configuration of multiple layers. 12 . The adapter of claim 1 , wherein the AI trainer is further configured to approximate the base model and train the adapter and to achieve a common objective. 13 . The adapter of claim 2 , wherein the AI trainer further comprises a weight constructor comprising a pool of parameters and a set of hyperparameters forming rules of propagation of the parameters from the pool of parameters into the weight matrices of the multiple paths of the non-linear modifier, and wherein the weight constructor is configured to: update the pool of parameters and the set of hyperparameters for given weights of the base model; and propagate the parameters from the pool of parameters to different weight matrices of different paths according to the trained hyperparameters. 14 . The adapter of claim 1 , wherein the AI trainer updates weights of the adapter for frozen weights of the base model. 15 . The adapter of claim 1 , wherein weight matrices of the adapter have lower dimensions than weight matrices of the base model. 16 . The adapter of claim 1 , wherein weight matrices of the adapter are coming from a pool of parameters updated by the AI trainer during the tuning, and wherein a number of parameters in the pool of parameters is more than 1000 times less than a number of parameters of the base model. 17 . A method for adapting a base model of an artificial intelligence (AI) system using an adapter, the method comprising: connecting, using a connector of the adapter, the adapter to the base model such that during an operation of the AI system at least some portion of data transformed by the base model is propagated from the base model to the adapter and back from the adapter to the base model; modifying, using a non-linear modifier of the adapter, the data received from the base model non-linearly before returning the modified portion of the data back to the base model; and tuning, using an AI trainer of the adapter, the non-linear modifier of the adapter by propagating training data through the base model and the adapter and updating weights of the non-linear modifier of the adapter for given weights of the base model to optimize a loss function. 18 . The method of claim 17 , wherein the non-linear modifier includes multiple paths formed by multiple AI architectures of data transformation, each of the paths is either a linear path configured to modify the received data linearly or a non-linear path configured to modify the received data non-linearly, wherein an AI architecture of the linear path modifies the received data linearly using one or multiple weight matrices, wherein an AI architecture of the non-linear path modifies the received data non-linearly using one or multiple weight matrices and modifies the received data non-linearly using one or multiple non-linear functions, and wherein the non-linear modifier includes at least one non-linear path. 19 . The method of claim 18 , wherein the AI trainer further comprises a weight constructor comprising a pool of parameters and a set of hyperparameters forming rules of propagation of the parameters from the pool of parameters into the weight matrices of the multiple paths of the non-linear modifier, and wherein the method further comprises: updating, using the weight constructor, the pool of parameters and the set of hyperparameters for given weights of the base model; and propagating, using the weight constructor, the parameters from the pool of parameters to different weight matrices of different paths according to the trained hyperparameters. 20 . A non-transitory computer readable storage medium embodied thereon a program executable by a processor for performing a method, the method comprising: connecting an adapter to a base model of an artificial intelligence (AI) system such that during an operation of the AI system at least some portion of data transformed by the base model is propagated from the base model to the adapter and back from the adapter to the base model; modifying the data received from the base model non-linearly before returning the modified portion of the data back to the base model; and t
using neural networks · CPC title
Transfer learning · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Semantic analysis · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.