Data harvester

US11436243B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11436243-B2
Application numberUS-202016865662-A
CountryUS
Kind codeB2
Filing dateMay 4, 2020
Priority dateMay 4, 2020
Publication dateSep 6, 2022
Grant dateSep 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data harvester enhances compliance audits by characterizing data sources, sampling data in one or more of the data sources to determine likelihood of success of the data harvest, estimating time for the data harvest, making recommendations from the samples based on machine learning relating to previous runs, then sampling additional data while estimated expected completion time. The harvested data may then be analyzed and compared to compliance requirements, and a compliance report may be generated.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: at least one computer processor; one or more computer readable storage media; and computer program instructions, the computer program instructions being stored on the one or more computer readable storage media for execution by the at least one computer processor, and the computer program instructions including instructions to: perform data harvesting on a subset of data in a data source; determine, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjust a size of the subset of data in the data source from which data harvesting is performed. 2. The apparatus of claim 1 , further comprising program instructions to characterize the data source according to a plurality of the following total size of the data source; total number of documents in the data source; types of documents in the data source; and network characteristics of the data source. 3. The apparatus of claim 2 , further comprising, responsive to the data source including a plurality of mailboxes, program instructions to characterize the data source according to organization, mailbox sizes and attachment percentages. 4. The apparatus of claim 1 , further comprising program instructions to generate, based, at least in part, on machine learning, at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 5. The apparatus of claim 1 , further comprising program instructions to: estimate a time for processing the adjusted size of the subset of data in the data source; and perform data harvesting on any additional data included in the adjusted size of the subset of data in the data source. 6. An article of manufacture comprising software stored on a computer readable storage medium, the software comprising program instructions to: perform data harvesting on a subset of data in a data source; determine, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjust a size of the subset of data in the data source from which data harvesting is performed. 7. The article of manufacture of claim 6 , further comprising program instructions to characterize the data source according to a plurality of the following: total size of the data source; total number of documents in the data source; types of documents in the data source; network characteristics of the data source; and when the data source includes a plurality of mailboxes, according to organization, mailbox sizes and attachment percentages. 8. The article of manufacture of claim 6 , further comprising program instructions to generate, based, at least in part, on machine learning at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 9. The article of manufacture of claim 6 , further comprising instructions to: estimate a time for processing the adjusted size of the subset of data in the data source; and perform data harvesting on any additional data included in the adjusted size of the subset of data in the data source. 10. A method for harvesting data, the method comprising: performing data harvesting on a subset of data in a data source determining, based on data harvested from the subset of data in the data source, a likelihood that performing data harvesting on all of the data in the data source will meet one or more compliance requirements; and responsive to determining, based on the data harvested from the subset of data in the data source, a likelihood below a threshold that performing data harvesting on all of the data in the data source will meet the one or more compliance requirements: adjusting a size of the subset of data in the data source from which data harvesting is performed. 11. The method of claim 10 , further comprising characterizing the data source according to a plurality of the following: total size of the data source; total number of documents in the data source; types of documents in the data source; and network characteristics of the data source. 12. The method of claim 11 , further comprising, responsive to the data source including a plurality of mailboxes, characterizing the data source according to organization, mailbox sizes and attachment percentages. 13. The method of claim 10 , further comprising generating, based, at least in part, on machine learning at least one recommendation based on historical data stored in a knowledgebase regarding at least one previous data harvest run. 14. The method of claim 10 , further comprising: estimating a time for processing adjusted size of the subset of data in the data source; and performing data harvesting on any additional data included in the adjusted size of the subset of data in the data source.

Assignees

Inventors

Classifications

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Machine learning · CPC title

  • Presentation of query results · CPC title

  • Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11436243B2 cover?
A data harvester enhances compliance audits by characterizing data sources, sampling data in one or more of the data sources to determine likelihood of success of the data harvest, estimating time for the data harvest, making recommendations from the samples based on machine learning relating to previous runs, then sampling additional data while estimated expected completion time. The harvested…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2465. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).