What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 10 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Decision tree with just-in-time nodal computations

US9336249B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9336249-B2
Application number	US-201313874281-A
Country	US
Kind code	B2
Filing date	Apr 30, 2013
Priority date	Apr 30, 2013
Publication date	May 10, 2016
Grant date	May 10, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and a decision tree. The decision tree may include a distinction node corresponding to a distinction requiring completion of a computation. Due to the fact that the computation may be, in the overall context of the process, computationally expensive, it may initially be left uncomputed. Accordingly, if the distinction node is never reached when records are being processed, no computation time gets wasted. However, if and when the distinction node is reached, the computer system may complete the computation and make the distinction based on results of the computation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for efficiently processing a large number of records, the method comprising: estimating, by a computer system, a probability that subject data is in a single class by: obtaining, by the computer system, a plurality of records, the plurality of records comprising the subject data; obtaining, by the computer system, two or more probability estimation trees, each of the two or more probability estimation trees comprising multiple paths; processing, by the computer system, the plurality of records through at least two of the two or more probability estimation trees the processing in each of the at least two of the two or more probability estimation trees comprising: arriving of the subject data, at a distinction node of the each of the at least two of the two or more probability estimation trees, the distinction node corresponding to a distinction requiring completion of an as yet uncomputed computation; completing, by the computer system after the arriving of the subject data at the distinction node of the each of the at least two of the two or more probability estimation trees, the computation; making the distinction based on results of the computation; selecting a path of the multiple paths based at least in part on the distinction; directing the subject data to another distinction node of the each of the at least two of the two or more probability estimation trees; and estimating a class for the subject data based at least in part on the path of the multiple paths selected; and combining, by the computer system, the class that is estimated of the at least two of the two or more probability estimation trees into the single class. 2. The method of claim 1 , wherein at least one of the two or more probability estimation trees is programmed to perform record linkage. 3. The method of claim 2 , wherein each record of the plurality of records comprises a customer profile. 4. The method of claim 3 , wherein the at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household. 5. The method of claim 4 , wherein the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records. 6. The method of claim 5 , wherein the computation comprises calculation of the metric. 7. The method of claim 6 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 8. The method of claim 1 , wherein the computer system provides a parallel computing environment. 9. The method of claim 8 , wherein the computer system comprises a plurality of worker nodes. 10. The method of claim 9 , wherein the processing is conducted by the plurality of worker nodes. 11. The method of claim 1 , wherein the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records. 12. The method of claim 11 , wherein the computation comprises calculation of the metric. 13. The method of claim 12 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 14. The method of claim 1 , wherein: each record of the plurality of records comprises a customer profile; and the plurality of records comprises at least five hundred million records. 15. The method of claim 1 , wherein: at least one of the two or more probability estimation trees is programmed to perform record linkage; each record of the plurality of records comprises a customer profile; the at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household; the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records; the computation comprises calculation of the metric; the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio; the computer system provides a parallel computing environment; the computer system comprises a plurality of worker nodes; and the processing is conducted by the plurality of worker nodes. 16. A computer-implemented method for efficiently processing a large number of records, the method comprising: estimating, by a computer system, a probability that subject data is in a single class by: obtaining, by the computer system, a plurality of records, the plurality of records comprising subject data and each record of the plurality of records comprises a customer profile; obtaining, by the computer system, two or more probability estimation trees, each of the two or more probability estimation trees comprising multiple paths and at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household; and processing, by the computer system, the plurality of records through at least two of the two or more probability estimation trees, the processing in each of the at least two of the two or more probability estimation trees comprising: arriving of the subject data at a distinction node of the each of the at least two of the two or more probability estimation trees, the distinction node corresponding to a metric characterizing a similarity between character strings of compared records of the plurality of records; calculating, by the computer system after the arriving of the subject data at the distinction node of the each of the at least two of the two or more probability estimation trees, a value corresponding to the metric; making a distinction based on the value; selecting a path of the multiple paths based at least in part on the distinction; directing the subject data to another distinction node of the each of the at least two of the two or more probability estimation trees; and estimating a class for the subject data based at least in part on the path of the multiple paths selected; and combining, by the computer system, the class that is estimated of the at least two of the two or more probability estimation trees into the single class. 17. The method of claim 16 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 18. The method of claim 16 , wherein the computer system provides a parallel computing environment. 19. The method of claim 18 , wherein the computer system comprises a plurality of worker nodes. 20. The method of claim 19 , wherein the processing is conducted by the plurality of worker nodes. 21. The method of claim 16 , wherein: the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio; the computing system provides a parallel computing environment; the computer system comprises a plurality of worker nodes; and the processing is conducted by the plurality of worker nodes. 22. A computer system comprising: one or more processing modules; and one

Assignees

Wal Mart Stores Inc

Inventors

Classifications

G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F17/30303Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 51790197

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9336249B2 cover?: A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and a decision tree. The decision tree may include a distinction node corresponding to a distinction requiring completion of a computation. Due to the fact that the computation may be, in the overall context of the process, computati…
Who is the assignee on this patent?: Wal Mart Stores Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 10 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).