Maintenance of a data glossary

US12050866B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12050866-B2
Application numberUS-202017120201-A
CountryUS
Kind codeB2
Filing dateDec 13, 2020
Priority dateDec 13, 2020
Publication dateJul 30, 2024
Grant dateJul 30, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may receive a data glossary comprising a list of terms. The system may then measure a usage dimension for a set of the terms from the list of terms. The system may select a candidate term from the set based on the usage dimension and perform a maintenance action on the candidate terms.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a data glossary comprising a list of terms; training a machine learning model with one or more training sets of data based on the data glossary; measuring a usage dimension for a set of the terms from the list of terms; selecting candidate terms from the set based on the usage dimension with the trained machine learning model, wherein the usage dimension indicates a need for definition improvement of the selected candidate terms, wherein the usage dimension is based on frequency of use and need for improvement; determining maintenance priority for the candidate terms based on the usage dimension; and performing a maintenance action on the candidate terms based on the selecting by the machine learning model and the maintenance priority. 2. The method of claim 1 , the selecting further comprising: determining that a term has crossed a maintenance threshold based on the usage dimension. 3. The method of claim 2 the selecting further comprising: providing the term to a user; receiving, from the user, a directive to proceed with the maintenance action; and including the term in the candidate terms. 4. The method of claim 1 , further comprising: storing one or more relationship attributes representing a related data structure for a grouping of the terms. 5. The method of claim 4 , wherein the data structure is selected from the group consisting of a logical rule, a file, a file type, and a content type. 6. The method of claim 1 , further comprising: skipping the performing of the maintenance action for a term based on a skip indicator in the data glossary. 7. The method of claim 1 the measuring further comprising: using a maintenance algorithm to determine the usage dimension of a term; including the term in the list of terms considered for the maintenance action; and prioritizing the list of terms considered for the maintenance action based on a maintenance priority. 8. The method of claim 1 , wherein the usage dimension is based on the usage of a term by one or more users. 9. A system comprising: a memory; and a processor in communication with the memory, the processor being configured to perform processes comprising: receiving a data glossary comprising a list of terms; training a machine learning model with one or more training sets of data based on the data glossary; measuring a usage dimension for a set of the terms from the list of terms; selecting candidate terms from the set based on the usage dimension with the trained machine learning model, wherein the usage dimension indicates a need for definition improvement of the selected candidate terms, wherein the usage dimension is based on frequency of use and need for improvement; determining maintenance priority for the candidate terms based on the usage dimension; and performing a maintenance action on the candidate terms based on the selecting by the machine learning model and the priority. 10. The system of claim 9 , the selecting further comprising: determining that a term has crossed a maintenance threshold based on the usage dimension. 11. The system of claim 10 the selecting further comprising: providing the term to a user; receiving, from the user, a directive to proceed with the maintenance action; and including the term in the candidate terms. 12. The system of claim 9 , the process further comprising: storing one or more relationship attributes representing a related data structure for a grouping of the terms. 13. The system of claim 12 , wherein the data structure is selected from the group consisting of a logical rule, a file, a file type, and a content type. 14. The system of claim 9 , the process further comprising: skipping the performing of the maintenance action for a term based on a skip indicator in the data glossary. 15. The system of claim 9 , the measuring further comprising: using a maintenance algorithm to determine the usage dimension of a term; including the term in the list of terms considered for the maintenance action; and prioritizing the list of terms considered for the maintenance action based on a maintenance priority. 16. The system of claim 9 , wherein the usage dimension is based on the usage of a term by one or more users. 17. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processors to perform a method, the method comprising: receiving a data glossary comprising a list of terms; training a machine learning model with one or more training sets of data based on the data glossary; measuring a usage dimension for a set of the terms from the list of terms; selecting candidate terms from the set based on the usage dimension with the trained machine learning model, wherein the usage dimension indicates a need for definition improvement of the selected candidate terms, wherein the usage dimension is based on frequency of use and need for improvement; determining maintenance priority for the candidate as based on the usage dimension; and performing a maintenance action on the candidate terms based on the selecting by the machine learning model and the maintenance priority. 18. The computer program product of claim 17 , the selecting further comprising: determining that a term has crossed a maintenance threshold based on the usage dimension. 19. The computer program product of claim 17 the selecting further comprising: providing the term to a user; receiving, from the user, a directive to proceed with the maintenance action; and including the term in the candidate terms. 20. The computer program product of claim 17 , the method further comprising: storing one or more relationship attributes representing a related data structure for a grouping of the terms.

Assignees

Inventors

Classifications

  • Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title

  • G06F40/247Primary

    Thesauruses; Synonyms · CPC title

  • G06F40/237Primary

    Lexical tools · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/242Primary

    Dictionaries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12050866B2 cover?
A system may receive a data glossary comprising a list of terms. The system may then measure a usage dimension for a set of the terms from the list of terms. The system may select a candidate term from the set based on the usage dimension and perform a maintenance action on the candidate terms.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/247. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).