Decision-tree based quantitative and qualitative record classification

US9292599B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9292599-B2
Application numberUS-201313874299-A
CountryUS
Kind codeB2
Filing dateApr 30, 2013
Priority dateApr 30, 2013
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for classifying records by sorting records based on both quantitative and qualitative information at a node in a decision tree. Technologies are also disclosed for determining whether records are linked together by applying quantitative and qualitative information at the same nodes in a decision tree. Furthermore, improvements to decision trees are disclosed in terms of the generation and/or training of decision trees that harnesses additional information in the quantitative and qualitative aspects that a unit of data relevant to a single node, and/or the relationships between these aspects, may provide a machine learning algorithm.

First claim

Opening claim text (preview).

What is claimed is: 1. A decision-tree system comprising: one or more processing modules; and one or more non-transitory storage modules storing computing instructions configured to run on the one or more processing modules and perform acts of: storing structured data comprising a decision tree in a data store; storing a record to be analyzed according to the decision tree in an additional data store; storing one or more training records in a training data store; selecting, by a distinction module, between multiple paths extending from a node in the decision tree based at least in part on a unit of data from the record, the unit of data from the record carrying information relevant to a distinction of the node, the distinction module further comprising: a real-value module operable to make a first comparison between the unit of data from the record and a predetermined real value for the distinction of the node; and a set-value module operable to make a second comparison between the unit of data from the record and a predetermined set value for the distinction of the node, the set-value module is operable to make the second comparison by determining whether the unit of data from the record comprises an element of a set, the set defined by the predetermined set value for the distinction of the node, and the predetermined set value for the distinction of the node defines the set that comprises a unit of data with missing data; the distinction module further operable to select a path from the multiple paths based at least in part on at least one of: the first comparison; or the second comparison; accounting, by a training module operable to train the decision tree, for a known path of one of the one or more training records from the multiple paths extending from the node by both: a first relationship between a unit of training data from the one of the one or more training records and the predetermined real value for the distinction of the node; and a second relationship between the unit of training data and the predetermined set value for the distinction of the node; generating, by the training module, the decision tree in the data store as a probability estimation tree (PET) from the one or more training records by a machine learning algorithm; and using, by the machine learning algorithm, the missing data to: determine one or more nodes of the decision tree; or set the predetermined real value for the distinction of the node. 2. The system of claim 1 , wherein: the one or more non-transitory storage modules storing the computing instructions configured to run on the one or more processing modules and further perform the acts of: storing a comparison record in at least one of: the data store; the additional data store; or another data store; and providing, by a comparison module, the distinction module with a unit of comparison data from the comparison record, the unit of comparison data carrying information relevant to the distinction of the node; and wherein: the real-value module is operable to determine if a variance between a real value of the unit of data from the record and a comparison real value of the unit of comparison data from the comparison record is within a tolerance set by the predetermined real value for the distinction of the node. 3. The system of claim 2 , wherein the training module generates the decision tree to determine a target feature, the target feature being whether two records are linked by pertaining to a common individual or household. 4. The system of claim 1 , wherein the predetermined set value defines a set that comprises a unit of data for which a request to provide data for the unit of data has been declined. 5. The system of claim 1 , wherein: the one or more non-transitory storage modules storing the computing instructions are configured to run on the one or more processing modules and further perform the acts of: storing a comparison record in at least one of: the data store; the additional data store; or another data store; and providing, by a comparison module, the distinction module with a unit of comparison data from the comparison record, the unit of comparison data carrying information relevant to the distinction of the node; the real-value module is operable to determine if a variance between a real value of the unit of data from the record and a comparison real value of the unit of comparison data from the comparison record is within a tolerance set by the predetermined real value for the distinction of the node; the training module generates the decision tree to determine a target feature, the target feature being whether two records are linked by pertaining to a common individual or household; and the predetermined set value defines a set that comprises a unit of data for which a request to provide data for the unit of data has been declined. 6. The system of claim 1 , wherein: the PET comprises: the node with different combinations of real values and set values from the one or more training records relative to the predetermined real value for the distinction of the node and the predetermined set value for the distinction of the node assigned by the training module to different paths extending from the node according to known paths of corresponding training records; and a leaf at a terminal end of the PET comprising a conditional probability distribution generated by the training module for a target feature, the conditional probability distribution informed by both real values and set values relative to the predetermined real value for the distinction of the node and the predetermined set value for the distinction of the node respectively. 7. A decision-tree system comprising: one or more processing modules; and one or more non-transitory storage modules storing computing instructions configured to run on the one or more processing modules and perform acts of: storing a decision tree within a data store; storing multiple records within the data store, at least a portion of the multiple training records are known to be linked to another record within the multiple training records; by a distinction module: extracting a first unit of data from a first record and a second unit of data from a second record with information allowing the distinction module to select from among multiple paths extending from a node of the decision tree in response to a distinction of the node; and selecting a path from the multiple paths based on a real value comparison and a determination with respect to set inclusion; comparing, by a real-value module, a first real value of the first unit of data and a second real value of the second unit of data; determining, by a set-value module, whether the first unit of data is a member of a set defined by a set value of the node; determining, by the set-value module, that the first unit of data is a member of a set where the first unit of data is missing data and to determine that the first unit of data is not a member of a set where the first unit of data is not missing data; by a training module: generating, by a machine learning algorithm, one or more nodes of the decision tree with corresponding distinctions, a predetermined real value for the distinction of the node and set values, and one or more leaves with conditional probability distributions providing a probability estimate as to whether a record is linked to another record; and assigning paths for various combinations of a relationship between real values of two records and the predetermined real value for the distinction of the node and a determination as to whether a unit of data is a member of the set defined by the set value of the node to create the one or more leaves with

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9292599B2 cover?
Systems and methods are disclosed for classifying records by sorting records based on both quantitative and qualitative information at a node in a decision tree. Technologies are also disclosed for determining whether records are linked together by applying quantitative and qualitative information at the same nodes in a decision tree. Furthermore, improvements to decision trees are disclosed in…
Who is the assignee on this patent?
Wal Mart Stores Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).