Data-driven enrichment of database elements
US-2022350810-A1 · Nov 3, 2022 · US
US11720533B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720533-B2 |
| Application number | US-202117536860-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 29, 2021 |
| Priority date | Nov 29, 2021 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for automatically determining different data types found in databases are disclosed. In one example, a computer implemented method comprises receiving a portion of identifying information for one or more components of a database, and generating one or more descriptions for the one or more components based at least in part on the portion of the identifying information for the one or more components. The one or more descriptions are inputted to one or more machine learning models, and, using the one or more machine learning models, one or more data types associated with the one or more components are predicted. The prediction is based at least in part on the one or more descriptions.
Opening claim text (preview).
What is claimed is: 1. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by one or more processors to cause the one or more processors to: receive a portion of identifying information for one or more components of a database; generate one or more descriptions for the one or more components based at least in part on the portion of the identifying information for the one or more components; input the one or more descriptions and create, read, update and delete operations data of the database to one or more machine learning models; predict, using the one or more machine learning models, one or more data types associated with the one or more components, wherein the prediction is based at least in part on the one or more descriptions and the create, read, update and delete operations data; wherein the predicting comprises: extracting from the create, read, update and delete operations data counts of a number of one or more of data reads, data writes, data deletes and data updates over a given time period for the one or more components; and determining, based at least in part on the counts, the one or more data types associated with the one or more components; and wherein the program instructions further cause the one or more processors to train the one or more machine learning models with: (i) labeled training data comprising respective ones of a plurality of data types corresponding to respective ones of a plurality of database components and respective ones of a plurality of descriptions of the database components; and (ii) data comprising correspondence between the respective ones of the plurality of data types and frequency of create, read, update and delete operations. 2. The computer program product of claim 1 , wherein the one or more components comprise at least one of one or more columns, one or more rows and one or more tables of the database, and the portion of the identifying information comprises at least one of one or more column names, one or more row names and one or more table names. 3. The computer program product of claim 2 , wherein the one or more column names, the one or more row names and the one or more table names comprise at least one of one or more acronyms and one or more abbreviations. 4. The computer program product of claim 3 , wherein, in generating the one or more descriptions for the one or more components, the program instructions cause the one or more processors to expand the one or more acronyms and the one or more abbreviations into one or more words. 5. The computer program product of claim 4 , wherein the program instructions further cause the one or more processors to use one of a convolutional neural network and a recurrent neural network to expand the one or more acronyms and the one or more abbreviations into the one or more words. 6. The computer program product of claim 5 , wherein the program instructions further cause the one or more processors to train one of the convolutional neural network and the recurrent neural network with training data comprising respective ones of at least one of a plurality of acronyms and a plurality of abbreviations paired with respective ones of a plurality of definitions. 7. The computer program product of claim 4 , wherein, in expanding the one or more acronyms and the one or more abbreviations, the program instructions cause the one or more processors to: extract a plurality of character-grams from the one or more column names, the one or more row names and the one or more table names; order the plurality of character-grams according to frequency of occurrence; identify a subset of the plurality of character-grams exceeding a threshold number of occurrences. 8. The computer program product of claim 1 , wherein the program instructions further cause the one or more processors to input table statistics of the database to the one or more machine learning models, wherein the prediction is based at least in part on the table statistics. 9. The computer program product of claim 1 , wherein the program instructions further cause the one or more processors to input at least one of foreign key relationships and primary key relationships of the database to the one or more machine learning models, wherein the prediction is based at least in part on at least one of the foreign key relationships and the primary key relationships. 10. The computer program product of claim 1 , wherein the program instructions further cause the one or more processors to transmit the one or more data types associated with the one or more components to one or more users via one or more user devices. 11. The computer program product of claim 10 , wherein the program instructions further cause the one or more processors to: receive feedback from the one or more users regarding accuracy of the one or more data types; and train the one or more machine learning models based at least in part on the feedback. 12. A computer implemented method, comprising: receiving a portion of identifying information for one or more components of a database; generating one or more descriptions for the one or more components based at least in part on the portion of the identifying information for the one or more components; inputting the one or more descriptions and create, read, update and delete operations data of the database to one or more machine learning models; and predicting, using the one or more machine learning models, one or more data types associated with the one or more components; wherein the prediction is based at least in part on the one or more descriptions and the create, read, update and delete operations data; wherein the predicting comprises: extracting from the create, read, update and delete operations data counts of a number of one or more of data reads, data writes, data deletes and data updates over a given time period for the one or more components; and determining, based at least in part on the counts, the one or more data types associated with the one or more components; and wherein the computer implemented method further comprises training the one or more machine learning models with: (i) labeled training data comprising respective ones of a plurality of data types corresponding to respective ones of a plurality of database components and respective ones of a plurality of descriptions of the database components; and (ii) data comprising correspondence between the respective ones of the plurality of data types and frequency of create, read, update and delete operations; and wherein the computer implemented method is performed by at least one processing device comprising a processor coupled to a memory when executing program code. 13. The computer implemented method of claim 12 , wherein the one or more components comprise at least one of one or more columns, one or more rows and one or more tables of the database, and the portion of the identifying information comprises at least one of one or more column names, one or more row names and one or more table names. 14. The computer implemented method of claim 13 , wherein the one or more column names, the one or more row names and the one or more table names comprise at least one of one or more acronyms and one or more abbreviations. 15. The computer implemented method of claim 14 , wherein generating the one or more descriptions for the one or more components comprises expanding the one or more acronyms and the one or more abbreviations into one or more words. 16. An apparatus, co
with details for schema evolution support · CPC title
Tablespace storage structures; Management thereof · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Approximate or statistical queries · CPC title
based on feedback of a supervisor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.