Cross-modal semi-supervised data labeling

US12417404B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12417404-B2
Application numberUS-202017108240-A
CountryUS
Kind codeB2
Filing dateDec 1, 2020
Priority dateDec 1, 2020
Publication dateSep 16, 2025
Grant dateSep 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One or more computer processors extract respective features for each inter-modal sample in an inter-modal dataset, for each intra-modal sample in an intra-modal dataset, and a subsequent sample, wherein the inter-modal dataset and the intra-modal dataset are contained in a multi-modal training dataset. The one or more computer processors estimate an inter-modal label utilizing inter-modal label transformation of a subsequent sample. The one or more computer processors estimate an intra-modal label utilizing intra-modal label transformation of the subsequent sample. The one or more computer processors label the subsequent sample with a cross-modal label by combining the estimated inter-modal label and the estimated intra-modal label.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: extracting, by one or more computer processors, respective features for: (i) each inter-modal sample in an inter-modal dataset, (ii) for each intra-modal sample in an intra-modal dataset, and (iii) a subsequent sample, wherein the inter-modal dataset and the intra-modal dataset are contained in a multi-modal training dataset, wherein each intra-modal sample is an image sample and each inter-modal sample is a textual sample, comprising: identifying, by one or more computer processors, one or more points in an intra-modal sample that contain discontinuities; utilizing, by one or more computer processors, corpus linguistic analysis to evaluate each inter-model sample; predicting, by one or more computer processors, an inter-modal label utilizing inter-modal label transformation of a subsequent sample, comprising: identifying, by one or more computer processors, a similarity score for each inter-modal sample in the inter-modal dataset and the subsequent sample by comparing extracted features for each inter-modal sample in the inter-modal dataset with extracted features for the subsequent sample and combining the similarity score with a vectorized label associated with the respective inter-modal sample; aggregating, by one or more computer processors, the combined similarity scores; creating, by one or more computer processors, the predicted inter-modal label by normalizing aggregated similarity scores; predicting, by one or more computer processors, an intra-modal label utilizing intra-modal label transformation of the subsequent sample; labeling, by one or more computer processors, the subsequent sample with a cross-modal label by combining the predicted inter-modal label and the predicted intra-modal label; creating, by one or more computer processors, a training dataset of cross-modal labels by relabeling each training sample in the multi-modal training dataset with created cross-modal labels; training, by one or more computer processors, a model with the created training dataset; and classifying, by one or more computer processors, another sample utilizing the trained model. 2. The computer-implemented method of claim 1 , wherein the intra-modal label transformation, comprises: calculating, by one or more computer processors, a similarity score for each inter-modal sample in the intra-modal dataset and the subsequent sample by comparing the extracted features for each intra-modal sample in the intra-modal dataset with extracted features for the subsequent sample and multiplying the calculated similarity score with a label associated with the respective intra-modal sample, aggregating, by one or more computer processors, the multiplied calculated similarity scores; and creating, by one or more computer processors, a predicted intra-modal label by normalizing the aggregated similarity scores. 3. The computer-implemented method of claim 1 , further comprising: training, by one or more computer processors, a model utilizing the labeled subsequent sample. 4. The computer-implemented method of claim 1 , wherein the label of the subsequent sample is a binary vector. 5. The computer-implemented method of claim 1 , wherein the subsequent sample is selected from the group consisting of a textual snippet and an image. 6. The computer-implemented method of claim 1 , wherein the multi-modal training dataset comprises labeled textual snippets, images, and videos. 7. A computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the stored program instructions comprising: program instructions to extract respective features for: (i) each inter-modal sample in an inter-modal dataset, (ii) for each intra-modal sample in an intra-modal dataset, and (iii) a subsequent sample, wherein the inter-modal dataset and the intra-modal dataset are contained in a multi-modal training dataset, wherein each intra-modal sample is an image sample and each inter-modal sample is a textual sample, wherein the program instructions comprise: program instructions to identify one or more points in an intra-modal sample that contain discontinuities; program instructions to utilize corpus linguistic analysis to evaluate each inter-model sample; program instructions to predict an inter-modal label utilizing inter-modal label transformation of a subsequent sample, comprising: program instructions to identify a similarity score for each inter-modal sample in the inter-modal dataset and the subsequent sample by comparing extracted features for each inter-modal sample in the inter-modal dataset with extracted features for the subsequent sample and combining the similarity score with a label associated with the respective inter-modal sample; program instructions to aggregate the combined similarity scores; program instructions to create the predicted inter-modal label by normalizing aggregated similarity scores; and program instructions to predict an intra-modal label utilizing intra-modal label transformation of the subsequent sample; program instructions to label the subsequent sample with a cross-modal label by combining the predicted inter-modal label and the predicted intra-modal label; program instructions to create a training dataset of cross-modal labels by relabeling each training sample in the multi-modal training dataset with created cross-modal labels; program instructions to train a model with the created training dataset; and program instructions to classify another sample utilizing the trained model. 8. The computer program product of claim 7 , wherein the program instructions, to intra-modal label transformation, comprise: program instructions to calculate a similarity score for each inter-modal sample in the intra-modal dataset and the subsequent sample by comparing the extracted features for each intra-modal sample in the intra-modal dataset with extracted features for the subsequent sample and multiplying the calculated similarity score with a label associated with the respective intra-modal sample; program instructions to aggregate the multiplied calculated similarity scores; and program instructions to create a predicted intra-modal label by normalizing the aggregated similarity scores. 9. The computer program product of claim 7 , wherein the program instructions, stored on the one or more computer readable storage media, further comprise: program instructions to train a model utilizing the labeled subsequent sample. 10. The computer program product of claim 7 , wherein the label of the subsequent sample is a binary vector. 11. The computer program product of claim 7 , wherein the subsequent sample is selected from the group consisting of a textual snippet and an image. 12. The computer program product of claim 7 , wherein the multi-modal training dataset comprises labeled textual snippets, images, and videos. 13. A computer system comprising: one or more computer processors; one or more computer readable storage media; and program instructions stored on the computer readable storage media for execution by at least one of the one or more processors, the stored program instructions comprising: program instructions to extract respective features for: (i) each inter-modal sample in an inter-modal dataset, (ii) for each intra-modal sample in an intra-modal dataset, and (iii) a subsequent sample, wherein the inter-modal dataset and the intra-modal dataset are contained in a multi-modal training dataset, wherein each intra-modal sample is an image sample and each inter-modal s

Assignees

Inventors

Classifications

  • Clustering or classification · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12417404B2 cover?
One or more computer processors extract respective features for each inter-modal sample in an inter-modal dataset, for each intra-modal sample in an intra-modal dataset, and a subsequent sample, wherein the inter-modal dataset and the intra-modal dataset are contained in a multi-modal training dataset. The one or more computer processors estimate an inter-modal label utilizing inter-modal label…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).