Decision tree representation for big data

US9147168B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9147168-B1
Application numberUS-201213722780-A
CountryUS
Kind codeB1
Filing dateDec 20, 2012
Priority dateDec 20, 2012
Publication dateSep 29, 2015
Grant dateSep 29, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and process for representing a decision tree in a tabular format is discussed. The format may contain all the necessary information to traverse the nodes in parallel on a distributed system while consuming an efficient amount of resources. In some embodiments, the tree may be stored in a relational database as a table.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for representing a decision tree in a table, comprising: receiving a training dataset; building a decision tree from the training dataset, wherein the decision tree comprises a plurality of nodes; storing each node as an individual row in the table on a non-transitory computer readable medium, wherein the row comprises a leftmost child id, a split criterion value (“SCV”), and a path from a root node; distributing the decision tree to multiple nodes in a massive parallel processing (“MPP”) database cluster; receiving a classification dataset to be classified using the decision tree; dividing the classification dataset into a plurality of segments; and distributing the segments to the multiple nodes in the MPP database cluster. 2. The method of claim 1 , further comprising assigning each node a node id and storing the node id in the table. 3. The method of claim 1 , further comprising determining the leftmost child id for each node having children and storing the leftmost child id with the node in the table. 4. The method of claim 1 , further comprising determining a predicted class for each node and storing the predicted class in the table. 5. The method of claim 4 , further comprising calculating the probability that a decision made on a node will result in the predicted class, and storing the probability in the table. 6. The method of claim 1 , further comprising calculating the SCV for a decision node in the decision tree and storing the SCV in the table. 7. The method of claim 1 , further comprising storing the path from the root node and storing the path as an array in the table. 8. The method of claim 1 , wherein the training dataset comprises an attribute, an attribute value, and a class value. 9. A computer program product for representing a decision tree in a database, comprising a non-transitory computer readable medium having program instructions embodied therein for: receiving a training dataset; building a decision tree from the training dataset, wherein the decision tree comprises a plurality of nodes; storing each node as an individual row in the table on a non-transitory computer readable medium, wherein the row comprises a leftmost child id, a split criterion value (“SCV”), and a path from a root node; distributing the decision tree to multiple nodes in a massive parallel processing (“MPP”) database cluster; receiving a classification dataset to be classified using the decision tree; dividing the classification dataset into a plurality of segments; and distributing the segments to the multiple nodes in the MPP database cluster. 10. The computer program product of claim 9 , further comprising assigning each node a node id and storing the node id in the table. 11. The computer program product of claim 9 , further comprising determining the leftmost child id for each node having children and storing the leftmost child id with the node in the table. 12. The computer program product of claim 9 , further comprising determining a predicted class for each node and storing the predicted class in the table. 13. The computer program product of claim 12 , further comprising calculating the probability that a decision made on a node will result in the predicted class, and storing the probability in the table. 14. The computer program product of claim 9 , further comprising the SCV for a decision node in the decision tree and storing the SCV in the table. 15. The computer program product of claim 9 , further comprising storing the path from a root node to each node and storing the path as an array in the table. 16. The computer program product of claim 9 , wherein the training dataset comprises an attribute, an attribute value, and a class value. 17. A system for representing a decision tree in a database comprising a non-transitory computer readable medium and a processor configured to: receive a training dataset; build a decision tree from the training dataset, wherein the decision tree comprises a plurality of nodes; store each node as an individual row in the table on a non-transitory computer readable medium, wherein the row comprises a leftmost child id, a split criterion value (“SCV”), and a path from a root node; distribute the decision tree to multiple nodes in a massive parallel processing (“MPP”) database cluster; receive a classification dataset to be classified using the decision tree; divide the classification dataset into a plurality of segments; and distribute the segments to the multiple nodes in the MPP database cluster. 18. The system of claim 17 , further comprising determining the leftmost child id for each node having children and storing the leftmost child id with the node in the table. 19. The system of claim 17 , further comprising determining a predicted class for each node and storing the predicted class in the table. 20. The system of claim 19 , further comprising calculating the probability that a decision made on a node will result in the predicted class, and storing the probability in the table.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9147168B1 cover?
A method, system, and process for representing a decision tree in a tabular format is discussed. The format may contain all the necessary information to traverse the nodes in parallel on a distributed system while consuming an efficient amount of resources. In some embodiments, the tree may be stored in a relational database as a table.
Who is the assignee on this patent?
Emc Corp
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 29 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).