Database entity analysis
US-2017132289-A1 · May 11, 2017 · US
US10936599B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10936599-B2 |
| Application number | US-201816141356-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 25, 2018 |
| Priority date | Sep 29, 2017 |
| Publication date | Mar 2, 2021 |
| Grant date | Mar 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for providing adaptive recommendations for a data set. A data set can include one or more columns of data. The data set can be profiled in order to identify actions that can be applied to the data in order to enrich the data. The data set and actions that were applied to the data set can be stored. Actions that are applied to subsequent data sets can take into account the actions that were applied to prior data sets having similar profiles.
Opening claim text (preview).
What is claimed is: 1. A method performed by an adaptive recommendation system, the method comprising: ingesting a first data set comprising a first column of data, wherein the first data set is a spreadsheet comprising a plurality of columns of data; profiling the first column of data in the first data set by collecting a first column level signature for the first column of data in the first data set, wherein the first column level signature comprises metadata describing the data in the first column of data; recommending one or more actions to be performed on the first column of data in the first data set based on the first column level signature of the first column of data in the first data set in order to enrich the first column of data; receiving one or more actions performed by a user on the first column of data; storing the one or more actions performed by the user on the first column of data in a history store; ingesting a second data set comprising a second column of data; profiling the second column of data in the second data set by collecting column level signatures for the second column of data in the second data set; determining column similarity between the first column of data in the first data set and the second column of data in the second data set; in response to determining that the first column of data in the first data set and the second column of data in the second data set are similar, identifying one or more actions performed by the user on the first column of data from data stored in the history store; recommending the one or more actions performed on the first column of data to the user; receiving one or more actions performed by the user on the second column of data; and storing the one or more actions performed by the user on the second column of data in the history store. 2. The method according to claim 1 , wherein the one or more actions performed by the user comprises an action from a group consisting of a recommended action and a manual action input by the user. 3. The method according to claim 1 , wherein the second data set is a spreadsheet comprising a second of columns of data. 4. The method according to claim 1 , wherein the metadata describing the data in the first column of data comprises at least one from a group consisting of metrics information metadata, data type information metadata, and semantic type information metadata. 5. The method according to claim 1 , wherein enriching the first column of data comprises enhancing the first column of data in accordance with specifications of the user. 6. The method according to claim 1 , wherein enriching the first column of data comprises correcting typographical errors the column of data. 7. The method according to claim 1 , wherein enriching the first column of data comprises providing additional information corresponding to a type of information in the column of data. 8. The method according to 8 , wherein in response to the type of information in the first column of data being city information, enriching the first column of data to comprise at least one from a group consisting of latitude information, longitude information, and demographic information. 9. The method according to claim 1 , wherein the history store comprises a metadata history store configured to store column level signatures for columns of data in the first data set and one or more actions that are performed on the column of data in the first data set. 10. The method according to claim 1 , wherein determining column similarity between the first column of data in the first data set and the second column of data in the second data set comprises calculating a similarity score. 11. The method according to claim 1 , further comprising: ingesting a third data set comprising a third column of data; profiling the third column of data in the third data set by collecting column level signatures for the third column of data in the third data set; determining column similarity between the third column of data in the third data set and one of the first column of data in the first data set and the second column of data in the second data set; in response to determining that the third column of data in the third data set and one of the first column of data in the first data set and the second column of data in the second data set are similar, identifying one or more actions performed by the user on the one of the first column of data in the first data set and the second column of data in the second data set that is similar to the third column of data in the third data set; recommending the one or more actions performed on the one of the first column of data in the first data set and the second column of data in the second data set that is similar to the third column of data in the third data set; receiving one or more actions performed by the user on the third column of data; and storing the one or more actions performed by the user on the third column of data in the history store. 12. A non-transitory computer readable medium storing a plurality of instructions which, when executed by a processor, causes the processor to perform a method comprising: ingesting a first data set comprising a first column of data, wherein the first data set is a spreadsheet comprising a plurality of columns of data; profiling the first column of data in the first data set by collecting a first column level signature for the first column of data in the first data set, wherein the first column level signature comprises metadata describing the data in the first column of data; recommending one or more actions to be performed on the first column of data in the first data set based on the first column level signature of the first column of data in the first data set in order to enrich the first column of data; receiving one or more actions performed by a user on the first column of data; storing the one or more actions performed by the user on the first column of data in a history store; ingesting a second data set comprising a second column of data; profiling the second column of data in the second data set by collecting column level signatures for the second column of data in the second data set; determining column similarity between the first column of data in the first data set and the second column of data in the second data set; in response to determining that the first column of data in the first data set and the second column of data in the second data set are similar, identifying one or more actions performed by the user on the first column of data from data stored in the history store; recommending the one or more actions performed on the first column of data to the user; receiving one or more actions performed by the user on the second column of data; and storing the one or more actions performed by the user on the second column of data in the history store. 13. The computer readable medium according to claim 12 , wherein the one or more actions performed by the user comprises an action from a group consisting of a recommended action and a manual action input by the user. 14. The computer readable medium according to claim 12 , wherein the second data set is a spreadsheet comprising a second plurality of columns of data. 15. The computer readable medium according to claim 12 , wherein the metadata describing the data in the first column of data comprises at least one from a group consisting of metrics information metadata, data type information metadata, and semantic type information metadata. 16. A system comprising: a memory; and a
Automatic learning of transformation rules, e.g. from examples · CPC title
of spreadsheets (form-filling G06F40/174) · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Column-oriented storage; Management thereof · CPC title
using data annotations, e.g. user-defined metadata · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.