Systems and methods for artificial-intelligence model training using unsupervised domain adaptation with multi-source meta-distillation

US2024046107A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024046107-A1
Application numberUS-202217966568-A
CountryUS
Kind codeA1
Filing dateOct 14, 2022
Priority dateAug 8, 2022
Publication dateFeb 8, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method has the steps of obtaining a set of training samples from one or more domains, using the set of training samples to query a plurality of artificial-intelligence (AI) models, combining the outputs of the queried AI models, and adapting a target AI model via knowledge distillation using the combined outputs.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining a set of training samples from one or more domains; using the set of training samples to query a plurality of artificial-intelligence (AI) models; combining the outputs of the queried AI models; and adapting a target AI model via knowledge distillation using the combined outputs. 2 . The method of claim 1 , wherein said combining the outputs of the queried AI models comprises: using a transformer encoder for combining the outputs of the queried AI models. 3 . The method of claim 1 , wherein said obtaining the set of training samples from the one or more domains comprises: obtaining the set of training samples from a plurality of domains, the set of training samples comprises a plurality of subsets of training samples obtained from the plurality of domains; wherein said using the set of training samples to query the plurality of AI models comprises: using each subset of training samples to query the plurality of AI models except an excluded AI model of the plurality of AI models; and wherein the excluded AI models of the plurality of subset of training samples are different AI models. 4 . The method of claim 1 , wherein said combining the outputs of the queried AI models comprises: weighting the outputs of the queried AI models, and combining the weighted outputs of the queried AI models to obtain a soft pseudo-label; and wherein said adapting the target AI model via the knowledge distillation using the combined outputs comprises: adapting the target AI model via the knowledge distillation using the soft pseudo-label. 5 . The method of claim 4 , wherein said adapting the target AI model via the knowledge distillation using the combined outputs and the soft pseudo-label comprises: querying the target AI model using the set of training samples; and adapting the target AI model via the knowledge distillation based on Kullback-Leibler (KL) divergence of the output of the queried target AI model and the soft pseudo-label. 6 . The method of claim 5 , wherein said adapting the target AI model via the knowledge distillation based on the KL divergence of the output of the queried target AI model and the soft pseudo-label comprises: minimizing the KL divergence using a gradient decent method. 7 . The method of claim 1 further comprising: evaluating a loss of the target AI model; and updating a plurality of parameters based on the evaluated loss; wherein the plurality of parameters comprises one or more first parameters of the target AI model and a parameter used in said combining the outputs of the queried AI models. 8 . The method of claim 7 , wherein said evaluating a loss of the target AI model comprises: querying the target AI model using a set of query samples, and evaluating a cross-entropy (CE) loss between the outputs of the queried target AI model and a set of labels corresponding to the set of query samples; and wherein said updating the plurality of parameters based on the evaluated loss comprises: updating the plurality of parameters by minimizing the CE loss. 9 . The method of claim 8 , wherein said updating the plurality of parameters by minimizing the CE loss comprises: updating the plurality of parameters by minimizing the CE loss using a gradient decent method. 10 . An apparatus comprising: at least one processor for performing actions comprising: obtaining a set of training samples from one or more domains; using the set of training samples to query a plurality of AI models; combining the outputs of the queried AI models; and adapting a target AI model via knowledge distillation using the combined outputs. 11 . The apparatus of claim 10 , wherein said combining the outputs of the queried AI models comprises: using a transformer encoder for combining the outputs of the queried AI models. 12 . The apparatus of claim 10 , wherein said obtaining the set of training samples from the one or more domains comprises: obtaining the set of training samples from a plurality of domains, the set of training samples comprises a plurality of subsets of training samples obtained from the plurality of domains; wherein said using the set of training samples to query the plurality of AI models comprises: using each subset of training samples to query the plurality of AI models except an excluded AI model of the plurality of AI models; and wherein the excluded AI models of the plurality of subset of training samples are different AI models. 13 . The apparatus of claim 10 , wherein said combining the outputs of the queried AI models comprises: weighting the outputs of the queried AI models, and combining the weighted outputs of the queried AI models to obtain a soft pseudo-label; and wherein said adapting the target AI model via the knowledge distillation using the combined outputs comprises: adapting the target AI model via the knowledge distillation using the soft pseudo-label. 14 . The apparatus of claim 13 , wherein said adapting the target AI model via the knowledge distillation using the combined outputs and the soft pseudo-label comprises: querying the target AI model using the set of training samples; and adapting the target AI model via the knowledge distillation based on KL divergence of the output of the queried target AI model and the soft pseudo-label. 15 . The apparatus of claim 10 , wherein the at least one processor is configured for performing further actions comprising: evaluating a loss of the target AI model; and updating a plurality of parameters based on the evaluated loss; wherein the plurality of parameters comprises one or more first parameters of the target AI model and a parameter used in said combining the outputs of the queried AI models. 16 . The apparatus of claim 15 , wherein said evaluating a loss of the target AI model comprises: querying the target AI model using a set of query samples, and evaluating a CE loss between the outputs of the queried target AI model and a set of labels corresponding to the set of query samples; and wherein said updating the plurality of parameters based on the evaluated loss comprises: updating the plurality of parameters by minimizing the CE loss. 17 . One or more non-transitory computer-readable storage devices comprising computer-executable instructions, wherein the instructions, when executed, cause a processing structure to perform actions comprising: obtaining a set of training samples from one or more domains; using the set of training samples to query a plurality of AI models; combining the outputs of the queried AI models; and adapting a target AI model via knowledge distillation using the combined outputs. 18 . The one or more non-transitory computer-readable storage devices of claim 17 , wherein said combining the outputs of the queried AI models comprises: using a transformer encoder for combining the outputs of the queried AI models. 19 . The one or more non-transitory computer-readable storage devices of claim 17 , wherein said obtaining the set of training samples from the one or more domains comprises: obtaining the set of training samples from a plurality of domains, the set of training samples comprises a plurality of subsets of training samples obtained from the plurality of domains; wherein said using the set of training samples to query the plurality of AI models comprises: using each subset of training sam

Assignees

Inventors

Classifications

  • G06N3/088Primary

    Non-supervised learning, e.g. competitive learning · CPC title

  • Physics · mapped topic

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • Learning methods · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024046107A1 cover?
A method has the steps of obtaining a set of training samples from one or more domains, using the set of training samples to query a plurality of artificial-intelligence (AI) models, combining the outputs of the queried AI models, and adapting a target AI model via knowledge distillation using the combined outputs.
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).