Labeling of data for machine learning

US10902352B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10902352-B2
Application numberUS-202016734570-A
CountryUS
Kind codeB2
Filing dateJan 6, 2020
Priority dateJun 5, 2014
Publication dateJan 26, 2021
Grant dateJan 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer generates labels for machine learning algorithms by retrieving, from a data storage circuit, multiple label sets that contain labels that each classify data points in a corpus of data. A graph is generated that includes a plurality of edges, each edge between two respective labels from different label sets of the multiple label sets. Weights are determined for the plurality of edges based upon a consistency between data points classified by two labels connected by the edges. An algorithm is applied that groups labels from the multiple label sets based upon the weights for the plurality of edges. Data points are identified from the corpus of data that represent conflicts within the grouped labels. An electronic message is transmitted in order to present the identified data points to entities for further classification. A new label set is generated using the further classification received from the entities.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for generating labels for machine learning algorithms, the method comprising: generating a graph that includes a plurality of edges, each edge between two respective labels from different label sets of multiple label sets, the multiple label sets containing labels that each classify data points in a corpus of data; determining weights for the plurality of edges based upon a consistency between the data points classified by two labels connected by the edges; applying an algorithm that creates grouped labels from the multiple label sets and based upon the weights for the plurality of edges; and identifying data points from the corpus of data that represent conflicts within the grouped labels; and presenting the identified data points to entities for further classification. 2. The method of claim 1 , further comprising receiving further classification of the multiple label sets from the entities. 3. The method of claim 2 , generating a new label set based upon the grouped labels and the further classification received from the entities. 4. A computer system for generating labels for machine learning algorithms, the computer system comprising: at least one processor circuit and computer readable storage device that are configured to include: a label set comparison module configured to: generate a graph that includes a plurality of edges, each edge between two respective labels from different label sets of multiple label sets, the multiple label sets containing labels that each classify data points in a corpus of data; determine weights for the plurality of edges based upon a consistency between data points classified by two labels connected by the edges; a label set coordinator module configured to: apply an algorithm that creates grouped labels from the multiple label sets and based upon the weights for the plurality of edges; and identify data points from the corpus of data that represent conflicts within the grouped labels; and a label set issue handler module configured to: present the identified data points to entities for further classification. 5. The system of claim 4 , the label set issue handler module further configured to receive further classification of the multiple label sets from the entities. 6. The system of claim 4 , the label set issue handler module further configured to generate a new label set based upon the grouped labels and the further classification received from the entities. 7. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: generating a graph that includes a plurality of edges, each edge between two respective labels from different label sets of multiple label sets, the multiple label sets containing labels that each classify data points in a corpus of data; determining weights for the plurality of edges based upon a consistency between the data points classified by two labels connected by the edges; applying an algorithm that creates grouped labels from the multiple label sets and based upon the weights for the plurality of edges; identifying data points from the corpus of data that represent conflicts within the grouped labels; presenting the identified data points to entities for further classification.

Assignees

Inventors

Classifications

  • Clustering; Classification · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Indexing structures · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Ensuring data consistency and integrity · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10902352B2 cover?
A computer generates labels for machine learning algorithms by retrieving, from a data storage circuit, multiple label sets that contain labels that each classify data points in a corpus of data. A graph is generated that includes a plurality of edges, each edge between two respective labels from different label sets of the multiple label sets. Weights are determined for the plurality of edges …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).