System and method for data management
US-2019121898-A1 · Apr 25, 2019 · US
US11436243B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11436243-B2 |
| Application number | US-202016865662-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 4, 2020 |
| Priority date | May 4, 2020 |
| Publication date | Sep 6, 2022 |
| Grant date | Sep 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data harvester enhances compliance audits by characterizing data sources, sampling data in one or more of the data sources to determine likelihood of success of the data harvest, estimating time for the data harvest, making recommendations from the samples based on machine learning relating to previous runs, then sampling additional data while estimated expected completion time. The harvested data may then be analyzed and compared to compliance requirements, and a compliance report may be generated.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: at least one computer processor; one or more computer readable storage media; and computer program instructions, the computer program instructions being stored on the one or more computer readable storage media for execution by the at least one computer processor, and the computer program instructions including instructions to: perform data harvesting on a subset of data in a data source; determine, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjust a size of the subset of data in the data source from which data harvesting is performed. 2. The apparatus of claim 1 , further comprising program instructions to characterize the data source according to a plurality of the following total size of the data source; total number of documents in the data source; types of documents in the data source; and network characteristics of the data source. 3. The apparatus of claim 2 , further comprising, responsive to the data source including a plurality of mailboxes, program instructions to characterize the data source according to organization, mailbox sizes and attachment percentages. 4. The apparatus of claim 1 , further comprising program instructions to generate, based, at least in part, on machine learning, at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 5. The apparatus of claim 1 , further comprising program instructions to: estimate a time for processing the adjusted size of the subset of data in the data source; and perform data harvesting on any additional data included in the adjusted size of the subset of data in the data source. 6. An article of manufacture comprising software stored on a computer readable storage medium, the software comprising program instructions to: perform data harvesting on a subset of data in a data source; determine, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjust a size of the subset of data in the data source from which data harvesting is performed. 7. The article of manufacture of claim 6 , further comprising program instructions to characterize the data source according to a plurality of the following: total size of the data source; total number of documents in the data source; types of documents in the data source; network characteristics of the data source; and when the data source includes a plurality of mailboxes, according to organization, mailbox sizes and attachment percentages. 8. The article of manufacture of claim 6 , further comprising program instructions to generate, based, at least in part, on machine learning at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 9. The article of manufacture of claim 6 , further comprising instructions to: estimate a time for processing the adjusted size of the subset of data in the data source; and perform data harvesting on any additional data included in the adjusted size of the subset of data in the data source. 10. A method for harvesting data, the method comprising: performing data harvesting on a subset of data in a data source determining, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjusting a size of the subset of data in the data source from which data harvesting is performed. 11. The method of claim 10 , further comprising characterizing the data source according to a plurality of the following: total size of the data source; total number of documents in the data source; types of documents in the data source; and network characteristics of the data source. 12. The method of claim 11 , further comprising, responsive to the data source including a plurality of mailboxes, characterizing the data source according to organization, mailbox sizes and attachment percentages. 13. The method of claim 10 , further comprising generating, based, at least in part, on machine learning at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 14. The method of claim 10 , further comprising: estimating a time for processing adjusted size of the subset of data in the data source; and performing data harvesting on any additional data included in the adjusted size of the subset of data in the data source.
Query processing support for facilitating data mining operations in structured databases · CPC title
Machine learning · CPC title
Presentation of query results · CPC title
Knowledge representation; Symbolic representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.