Analysis of a system for matching data records
US-10698755-B2 · Jun 30, 2020 · US
US10901979B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10901979-B2 |
| Application number | US-201816115622-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 29, 2018 |
| Priority date | Aug 29, 2018 |
| Publication date | Jan 26, 2021 |
| Grant date | Jan 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example computer-implemented method, a dataset and a query including an expression to be matched to the dataset is received via a processor. A false positive rate (FPR) and a false negative rate (FNR) is calculated via the processor for each possible value assignment of a plurality of possible value assignments in response to detecting a missing value in the dataset. A value assignment is selected, via the processor, from the plurality of possible value assignments based on the FPR and the FNR. A response to the query is generated via the processor based on the selected value assignment.
Opening claim text (preview).
What is claimed is: 1. A system, comprising a processor to: receive a dataset and a query comprising an expression to be matched to the dataset; calculate a false positive rate (FPR) and a false negative rate (FNR) for each possible value assignment of a plurality of possible value assignments in response to detecting a missing value in the dataset; select a value assignment from the plurality of possible value assignments based on the FPR and the FNR, wherein the processor is to generate a Pareto front for the plurality of possible value assignments based on the FNRs and the FPRs of the plurality of possible value assignments and select the value assignment from a subset of the possible value assignments on the Pareto front; and generate a response to the query based on the selected value assignment. 2. The system of claim 1 , wherein the processor is to convert the expression to conjunctive normal form and calculate the FPR and the FNR using the conjunctive normal form of the expression. 3. The system of claim 1 , wherein the processor is to calculate the FPR or the FNR using an inclusion-exclusion principle. 4. The system of claim 1 , wherein the processor is to calculate the FPR or the FNR based on a set of disjunctions generated based on the query. 5. The system of claim 1 , wherein the response comprises a set of events of interest detected based on the selected value assignment. 6. The system of claim 5 , wherein the set of events of interest comprises a malicious event. 7. The system of claim 1 , wherein the processor is to convert the expression to a normalized form. 8. A computer-implemented method, comprising: receiving, via a processor, a dataset and a query comprising an expression to be matched to the dataset; calculating, via the processor, a false positive rate (FPR) and a false negative rate (FNR) for each possible value assignment of a plurality of possible value assignments in response to detecting a missing value in the dataset; selecting, via the processor, a value assignment from the plurality of possible value assignments based on the FPR and the FNR, wherein selecting the value assignment comprises generating a Pareto front for the plurality of possible value assignments based on the FNRs and the FPRs of the plurality of possible value assignments, and selecting the value assignment from a subset of the possible value assignments on the Pareto front; and generating, via the processor, a response to the query based on the selected value assignment. 9. The computer-implemented method of claim 8 , comprising converting the expression to a conjunctive normal form and calculating the FPR and the FNR using the conjunctive normal form of the expression. 10. The computer-implemented method of claim 8 , wherein calculating the FPR comprises using an inclusion-exclusion principle. 11. The computer-implemented method of claim 8 , wherein calculating the FNR comprises generating a set of disjunctions based on the query. 12. The computer-implemented method of claim 8 , wherein selecting the value assignment comprises performing a simple exhaustive search to minimize a predefined loss function that comprises a combination of the FPR and the FNR. 13. The computer-implemented method of claim 8 , wherein generating the response comprises detecting an event of interest based on the selected value assignment. 14. A computer program product comprising a computer-readable storage medium having program code embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program code executable by a processor to cause the processor to: receive a dataset and a query comprising an expression to be matched to the dataset; calculate a false positive rate (FPR) and a false negative rate (FNR) for each possible value assignment of a plurality of possible value assignments in response to detecting a missing value in the dataset; select a value assignment from the plurality of possible value assignments based on the FPR and the FNR, wherein the processor is to generate a Pareto front for the plurality of possible value assignments based on the FNRs and the FPRs of the plurality of possible value assignments and select the value assignment from a subset of the possible value assignments on the Pareto front; and generate a response to the query based on the selected value assignment. 15. The computer program product of claim 14 , further comprising program code executable by the processor to convert the expression to a conjunctive normal form and calculate the FPR and the FNR using the conjunctive normal form of the expression. 16. The computer program product of claim 14 , further comprising program code executable by the processor to calculate the FPR or the FNR using an inclusion-exclusion principle. 17. The computer program product of claim 14 , further comprising program code executable by the processor to calculate the FPR or the FNR by generating a set of disjunctions based on the query. 18. The computer program product of claim 14 , further comprising program code executable by the processor to perform a simple exhaustive search to minimize a predefined loss function that is a combination of the FPR and the FNR.
using ranking · CPC title
Approximate or statistical queries · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Ensuring data consistency and integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.