Refining classification results based on glossary relationships

US11308128B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11308128-B2
Application numberUS-201715837788-A
CountryUS
Kind codeB2
Filing dateDec 11, 2017
Priority dateDec 11, 2017
Publication dateApr 19, 2022
Grant dateApr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system and computer program product for classifying a data collection of data of a predefined domain. A hierarchical representation scheme describing terms of the domain and one or more relationships between the terms is provided. At least one classifier may be applied on the data collection, resulting in a set of term assignments. Each term assignment of the term assignments associates a term candidate with a respective confidence value to the collection or to one or more data items of the collection. At least one of the term assignments may be refined based on the representation scheme and the set of term assignments.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer program product for classifying a data collection of data of a predefined domain, wherein the data collection comprises a collection of one or more data sets comprising data items, wherein the one or more data sets comprise one or more tables or files represented by a database or a directory, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code comprising programming instructions for: providing a hierarchical representation scheme describing terms of the domain and one or more relationships between the terms; applying at least one classifier on the data collection, resulting in a set of term assignments, each term assignment of the term assignments associating a term candidate with a respective confidence value to the collection or to one or more of the data items of the collection, wherein each term assignment of the term assignments is structured in a form of a tuple that indicates data of the data collection that is processed or classified, a classifier being applied, the term candidate assigned to the data processed or classified and the associated confidence value; and refining at least one of the term assignments based on the representation scheme and the set of term assignments. 2. The computer program product as recited in claim 1 , wherein the refining comprises: using the term candidate assigned to the collection for determining a set of assignable terms to the collection from the representation scheme; determining from the representation scheme supporting terms (ST) that are related to the assignable terms; identifying each data item of the collection that can be assigned to a term of the supporting terms using the set of term assignments, and providing a set of term assignments of the identified data items; and refining the term assignments of one or more of the collection and the data items using the set of term assignments of the identified data items. 3. The computer program product as recited in claim 2 , wherein determining supporting terms of a given assignable term comprises: determining first set of terms comprising terms having a parent inheritance relationship with the given assignable term, the first set of terms further comprises the assignable term and for each first term of the first set determining a second set of terms related to the first term by an associative relationship, wherein the supporting terms comprise the second sets of terms. 4. The computer program product as recited in claim 3 , wherein the parent inheritance relationship is a single inheritance relation being an is-a relationship. 5. The computer program product as recited in claim 3 , wherein the associative relationship indicates that the first term has an has-a relationship with a second term. 6. The computer program product as recited in claim 2 , wherein the set of assignable terms comprises the term candidate and terms having a child inheritance relationship with the term candidate. 7. The computer program product as recited in claim 2 , wherein the at least one classifier comprises multiple classifiers, wherein the program code further comprises the programming instructions for: identifying in the set of term assignments a subset of term assignments having each a same given data and same given term assigned to the given data, combining the confidence values of the subset of term assignments for providing a combined term assignment assigning the given term to the given data with the combined confidence value, wherein the refining is performed on at least part of the combined term assignments. 8. The computer program product as recited in claim 1 , wherein the representation scheme comprises an ontology tree providing an ontology of the domain describing the terms and the hierarchy of the terms. 9. The computer program product as recited in claim 1 , wherein the at least one classifier comprises multiple classifiers, wherein the program code further comprises the programming instructions for: identifying in the set of term assignments a subset of term assignments having each a same given data and same given term assigned to the given data, combining the confidence values of the subset of term assignments for providing a combined term assignment assigning the given term to the given data with the combined confidence value, wherein the refining is performed on the combined term assignment. 10. The computer program product as recited in claim 9 , wherein combining the confidence values comprises: assigning a weight to a classifier of a respective term assignment of the subset of term assignments and performing a sum of the confidence values weighted by respective weights and divided by the number of summed confidence values. 11. A system, comprising: a memory unit for storing a computer program for classifying a data collection of data of a predefined domain, wherein the data collection comprises a collection of one or more data sets comprising data items, wherein the one or more data sets comprise one or more tables or files represented by a database or a directory; and a processor coupled to the memory unit, wherein the processor is configured to execute program instructions of the computer program comprising: providing a hierarchical representation scheme describing terms of the domain and one or more relationships between the terms; applying at least one classifier on the data collection, resulting in a set of term assignments, each term assignment of the term assignments associating a term candidate with a respective confidence value to the collection or to one or more of the data items of the collection, wherein each term assignment of the term assignments is structured in a form of a tuple that indicates data of the data collection that is processed or classified, a classifier being applied, the term candidate assigned to the data processed or classified and the associated confidence value; and refining at least one of the term assignments based on the representation scheme and the set of term assignments. 12. The system as recited in claim 11 , wherein the refining comprises: using the term candidate assigned to the collection for determining a set of assignable terms to the collection from the representation scheme; determining from the representation scheme supporting terms (ST) that are related to the assignable terms; identifying each data item of the collection that can be assigned to a term of the supporting terms using the set of term assignments, and providing a set of term assignments of the identified data items; and refining the term assignments of one or more of the collection and the data items using the set of term assignments of the identified data items. 13. The system as recited in claim 12 , wherein determining supporting terms of a given assignable term comprises: determining first set of terms comprising terms having a parent inheritance relationship with the given assignable term, the first set of terms further comprises the assignable term and for each first term of the first set determining a second set of terms related to the first term by an associative relationship, wherein the supporting terms comprise the second sets of terms. 14. The system as recited in claim 13 , wherein the parent inheritance relationship is a single inheritance relation being an is-a relationship. 15. The system as recited in claim 13 , wherein the associative relationship indicates that the first term has an has-a relationship with a second term. 16. The syste

Assignees

Inventors

Classifications

  • G06F16/285Primary

    Clustering or classification · CPC title

  • Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308128B2 cover?
A method, system and computer program product for classifying a data collection of data of a predefined domain. A hierarchical representation scheme describing terms of the domain and one or more relationships between the terms is provided. At least one classifier may be applied on the data collection, resulting in a set of term assignments. Each term assignment of the term assignments associat…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).