Predictive data and model selection for transfer learning in natural language processing

US11934922B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11934922-B2
Application numberUS-202017066685-A
CountryUS
Kind codeB2
Filing dateOct 9, 2020
Priority dateOct 9, 2020
Publication dateMar 19, 2024
Grant dateMar 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer system, product, and method are provided. The computer system includes an artificial intelligence (AI) platform operatively coupled to a processor. The AI platform includes tools in the form of a machine learning model (MLM) manager, a metric manager, and a training manager. The MLM manager accesses a plurality of pre-trained source MLMs, and inputs a plurality of data objects of a test dataset into each of the source MLMs. The test dataset includes the plurality of data objects associated with respective labels. For each source MLM, associated labels are generated from the inputted data objects and a similarity metric is calculated. The MLM manager selects a base MLM to be used for transfer learning from the plurality of source MLMs based upon the calculated similarity metric. The training manager trains the selected base MLM with a target dataset for the target domain.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: a processor operatively coupled to memory; and an artificial intelligence (AI) platform, in communication with the processor, having machine learning (ML) tools, the tools comprising: a machine learning model (MLM) manager configured to: access a plurality of pre-trained source MLMs; input a plurality of data objects of a test dataset into each of the source MLMs, the test dataset comprising the plurality of data objects associated with labels; and for each of the source MLMs, generate associated labels from the inputted data objects; a metric manager configured to, for each of the source MLMs, calculate a metric reflecting a similarity between the labels of the test dataset and the generated labels; the MLM manager further configured to select a base MLM to be used for transfer learning from the plurality of source MLMs based upon the calculated metric; a training manager configured to train the selected base MLM with a target dataset for the target domain; and the MLM manager further configured to capture knowledge of the selected base MLM, including: use a context representation layer of the selected base MLM; and replace a classification layer of the selected base MLM with a new classifier mapped to space of the target dataset, wherein the calculated metric is limited to a position of a returned output, and the position is selected from a span, a bounding box, or a location. 2. The computer system of claim 1 , wherein the metric manager is further configured to: determine, for each of the source MLMs, a respective score based on at least one respective source dataset used to pre-train the source MLM, the score representing an accuracy measure of the source MLM with respect to a respective source domain; and include, for each of the source MLMs, the determined score as a weight in the calculated metric. 3. The computer system of claim 1 , wherein the calculated metric comprises, for each of the source MLMs, an assessment of the labels of the test dataset compared to generated labels associated with the source MLMs, wherein each of the assessments is individually selected as a true positive, a false positive, or a false negative. 4. The computer system of claim 3 , wherein: the true positive is defined as a match between the label of the dataset and the compared generated label associated with the source MLM; the false positive is defined as a match between the offset of an extracted mention and a non-match with ground truth; and the false negative is defined as a non-match with the offset of the extracted mention and a non-match with ground truth. 5. The computer system of claim 4 , further comprising the metric manager to: determine an accuracy measure of one or more inferences of the target dataset processed by one of the source MLMs, the accuracy measure to ignore returned labels and limit consideration to: the position, span, bounding box, or location of the returned output, and an accuracy measure of the source MLM on a source dataset; and leverage the determined accuracy measure to calculate a weight of each of the source MLMs. 6. A computer program product to utilize machine learning to facilitate transfer learning, the computer program product comprising: a tangible computer readable storage medium having program code embodied therewith, the program code executable by a processor to: access a plurality of pre-trained source machine learning models; for each of the source MLMs: input a plurality of data objects of a test dataset into each of the source MLMs, the test dataset comprising the plurality of data objects associated with labels; generate associated labels from the inputted data objects; and calculate a metric reflecting a similarity between the labels of the test dataset and the generated labels; select a base MLM to be used for transfer learning from the plurality of source MLMs based upon the calculated metric; and train the selected base MLM with a target dataset for the target domain, wherein program code is executable by the processor to capture knowledge of the selected base MLM, including: use a context representation layer of the selected base model; and replace a classification layer of the selected base model with a new classifier mapped to space of the target dataset, and wherein the calculated metric is limited to a position of a returned output, and the position is selected from a span, a bounding box, or a location. 7. The computer program product of claim 6 , wherein the program code is executable by the processor to: determine, for each of the source MLMs, a respective score based on at least one respective source dataset used to pre-train the source MLM, the score representing an accuracy measure of the source MLM with respect to a respective source domain; and include, for each of the source MLMs, the determined score as a weight in the calculating of the calculated metric. 8. The computer program product of claim 6 , wherein the calculated metric comprises, for each of the source MLMs, an assessment of the labels of the test dataset compared to generated labels associated with the source MLMs, wherein each of the assessments is individually selected as a true positive, a false positive, or a false negative. 9. The computer program product of claim 8 , wherein: the true positive is defined as a match between the label of the dataset and the compared generated label associated with the source MLM; the false positive is defined as a match between the offset of an extracted mention and a non-match with ground truth; and the false negative is defined as a non-match with the offset of the extracted mention and a non-match with ground truth. 10. The computer program product of claim 9 , further comprising program code to: determine an accuracy measure of one or more inferences of the target dataset processed by one of the source MLMs, the accuracy measure to ignore returned labels and limit consideration to: the position, span, bounding box, or location of the returned output, and an accuracy measure of the source MLM on a source dataset; and leverage the determined accuracy measure to calculate a weight of each of the source MLMs. 11. A computer-implemented transfer learning method, comprising: carrying out operations on a computing device comprising a processor operably associated with memory, the operations comprising: accessing a plurality of pre-trained source machine learning models (MLMs); accessing a test dataset of a target domain, the test dataset comprising a plurality of data objects associated with labels; inputting the data objects of the test dataset into each of the source MLMs generating associated labels for each of the source MLMs; for each of the source MLMs, calculating a metric reflecting a similarity between the labels of the test dataset and the generated labels; selecting a base MLM to be used for transfer learning from the plurality of MLMs based upon the calculated metric; and training the selected base MLM with a target dataset for the target domain, wherein selecting the source model as the selected base MLM further comprises capturing knowledge of the selected base MLM, including: using a context representation layer of the selected base model; and replacing a classification layer of the selected base model with a new classifier mapped to space of the target dataset, and wherein the calculated metric is limited to a position of a returned output, and the position is selected from a span, bounding box, or location. 12. The method of claim 11 , wherein calculating the calculated metric

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11934922B2 cover?
A computer system, product, and method are provided. The computer system includes an artificial intelligence (AI) platform operatively coupled to a processor. The AI platform includes tools in the form of a machine learning model (MLM) manager, a metric manager, and a training manager. The MLM manager accesses a plurality of pre-trained source MLMs, and inputs a plurality of data objects of a t…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).