Resource-efficient training of a sequence-tagging model

US12511548B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12511548-B2
Application numberUS-202218075876-A
CountryUS
Kind codeB2
Filing dateDec 6, 2022
Priority dateDec 6, 2022
Publication dateDec 30, 2025
Grant dateDec 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique iteratively updates model weights of a teacher model and a student model. In operation, the teacher model produces noisy original pseudo-labeled training examples from unlabeled training examples. The technique weights the original pseudo-labeled training examples based on validation information. The technique then updates model weights of the student model based on the weighted pseudo-labeled training examples. The validation information, which is used to weight the original pseudo-labeled training examples, is produced by selecting labeled training examples based on an uncertainty-based factor and a similarity-based factor. The uncertainty-based factor describes an extent to which the student model produces uncertain classification results for the set of labeled training examples. The similarity-based factor describes the similarity between the set of labeled training examples and the unlabeled training examples. Overall, the technique is efficient because it eliminates the need to produce a large number labeled training examples.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for generating a tagging model, comprising: selecting a set of labeled training examples from a collection of labeled training examples, to form validation information, the collection of labeled training examples being less than a collection of unlabeled training examples, and the selecting including assessing appropriateness of a candidate labeled training example for inclusion in the validation information based on: an uncertainty measure that expresses an extent to which a student model produces an uncertain classification result for the candidate labeled training example; and a similarity measure that expresses an extent to which the candidate labeled training example is similar to unlabeled training examples in the collection of unlabeled training examples; producing original pseudo-labeled training examples using a teacher model based on the unlabeled training examples in the collection of unlabeled training examples; weighting the original pseudo-labeled training examples based on the validation information, to produce weighted pseudo-labeled training examples, the weighting reducing an effect of noise in the pseudo-labeled training examples; and updating model weights of the student model based on the weighted pseudo-labeled trained examples, the student model, once trained, corresponding to the tagging model. 2 . The computer-implemented method of claim 1 , further including repeating the selecting, producing, weighting, and updating plural times. 3 . The computer-implemented method of claim 2 , further including using the tagging model, following the repeating, to classify tokens in a sequence of tokens. 4 . The computer-implemented method of claim 3 , wherein the sequence of tokens describes characteristics of an identified product. 5 . The computer-implemented method of claim 1 , further comprising, after a prescribed number of repetitions of the selecting, producing, weighting, and updating, updating model weights of the teacher model based on current model weights of the student model. 6 . The computer-implemented method of claim 1 , further including, prior to the selecting, and for at least some of the collection of labeled training examples, masking at least part of the labeled training examples. 7 . The computer-implemented method of claim 6 , wherein the masking includes masking tokens that have been given an indeterminate classification status. 8 . The computer-implemented method of claim 1 , wherein the updating the model weights of the student model is also performed based on labeled training examples from the collection of labeled training examples. 9 . The computer-implemented method of claim 1 , further including generating the similarity measure by: forming individual similarity measures, each individual similarity measure expressing similarity between the candidate labeled training example and a particular unlabeled training example; and forming an overall similarity measure based on the individual similarity measures. 10 . The computer-implemented method of claim 9 , wherein the overall similarity measure is an average of the individual similarity measures. 11 . A computing system, comprising: a processing system comprising a processor; and a storage device for storing machine-readable instructions that, when executed by the processing system, perform operations comprising: classifying tokens in a sequence of tokens using a trained tagging model, to produce a classified sequence of tokens; and performing an application task based on the classified sequence of tokens, the trained tagging model being produced by training a student model in a training framework that includes the student model and a teacher model, model weights of the student model being updated based on pseudo-labeled training examples produced by the teacher model for unlabeled training examples in a collection of unlabeled training examples, the pseudo-labeled training examples being weighted based on validation information, and the validation information including a set of labeled training examples that are selected from a collection of labeled training examples based on plural factors, one of the plural factors being an assessed similarity between the set of labeled training examples and unlabeled training examples in the collection of unlabeled training examples. 12 . The computing system of claim 11 , wherein the sequence of tokens corresponds to a title of a product, wherein at least some of the tokens describe attributes of the product, and wherein the classifying classifies the types of the attributes. 13 . The computing system of claim 11 , wherein the performing an application task includes performing a search operation, matching operation, and/or filtering operation based on the classified sequence of tokens. 14 . The computing system of claim 11 , wherein another factor used to select the labeled set of training examples in the validation information is an assessed uncertainty of classification results produced by the student model for the labeled set of training examples. 15 . The computing system of claim 11 , wherein, for at least some of the set of labeled training examples, parts of the labeled training examples are masked. 16 . The computing system of claim 15 , wherein the masking includes masking tokens that have been given an indeterminate classification status. 17 . A computer-readable storage medium for storing computer-readable instructions, wherein a processing system executing the computer-readable instructions performs operations comprising: selecting a set of labeled training examples from a collection of labeled training examples based on plural factors, to form validation information, one of the plural factors being an assessed similarity between the set of labeled training examples and unlabeled training examples in a collection of unlabeled training examples; producing original pseudo-labeled training examples using a teacher model based the unlabeled training examples in the collection of unlabeled training examples; weighting the original pseudo-labeled training examples based on the validation information, to produce weighted pseudo-labeled training examples; updating model weights of the student model based on the weighted pseudo-labeled trained examples; and repeating the selecting, producing, weighting, and updating plural times until a training objective is achieved, the student model, after the repeating, corresponding to a tagging model for use in classifying a sequence of tokens. 18 . The computer-readable storage medium of claim 17 , wherein another factor of the plural factors is an extent to which the student model produces uncertain classification results for the set of labeled training examples. 19 . The computer-readable storage medium of claim 17 , further including, prior to the selecting, masking at least part of labeled training examples in the collection of labeled training examples. 20 . The computer-readable storage medium of claim 19 , wherein the masking includes masking tokens that have been given an indeterminate classification status.

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Machine learning · CPC title

  • G06N3/096Primary

    Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12511548B2 cover?
A technique iteratively updates model weights of a teacher model and a student model. In operation, the teacher model produces noisy original pseudo-labeled training examples from unlabeled training examples. The technique weights the original pseudo-labeled training examples based on validation information. The technique then updates model weights of the student model based on the weighted pse…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).