Automated Remote Music Identification and Publishing System and Method
US-2024427820-A1 · Dec 26, 2024 · US
US9336249B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9336249-B2 |
| Application number | US-201313874281-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 30, 2013 |
| Priority date | Apr 30, 2013 |
| Publication date | May 10, 2016 |
| Grant date | May 10, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method is disclosed for efficiently processing a large number of records. In the method, a computer system may obtain a plurality of records and a decision tree. The decision tree may include a distinction node corresponding to a distinction requiring completion of a computation. Due to the fact that the computation may be, in the overall context of the process, computationally expensive, it may initially be left uncomputed. Accordingly, if the distinction node is never reached when records are being processed, no computation time gets wasted. However, if and when the distinction node is reached, the computer system may complete the computation and make the distinction based on results of the computation.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for efficiently processing a large number of records, the method comprising: estimating, by a computer system, a probability that subject data is in a single class by: obtaining, by the computer system, a plurality of records, the plurality of records comprising the subject data; obtaining, by the computer system, two or more probability estimation trees, each of the two or more probability estimation trees comprising multiple paths; processing, by the computer system, the plurality of records through at least two of the two or more probability estimation trees the processing in each of the at least two of the two or more probability estimation trees comprising: arriving of the subject data, at a distinction node of the each of the at least two of the two or more probability estimation trees, the distinction node corresponding to a distinction requiring completion of an as yet uncomputed computation; completing, by the computer system after the arriving of the subject data at the distinction node of the each of the at least two of the two or more probability estimation trees, the computation; making the distinction based on results of the computation; selecting a path of the multiple paths based at least in part on the distinction; directing the subject data to another distinction node of the each of the at least two of the two or more probability estimation trees; and estimating a class for the subject data based at least in part on the path of the multiple paths selected; and combining, by the computer system, the class that is estimated of the at least two of the two or more probability estimation trees into the single class. 2. The method of claim 1 , wherein at least one of the two or more probability estimation trees is programmed to perform record linkage. 3. The method of claim 2 , wherein each record of the plurality of records comprises a customer profile. 4. The method of claim 3 , wherein the at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household. 5. The method of claim 4 , wherein the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records. 6. The method of claim 5 , wherein the computation comprises calculation of the metric. 7. The method of claim 6 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 8. The method of claim 1 , wherein the computer system provides a parallel computing environment. 9. The method of claim 8 , wherein the computer system comprises a plurality of worker nodes. 10. The method of claim 9 , wherein the processing is conducted by the plurality of worker nodes. 11. The method of claim 1 , wherein the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records. 12. The method of claim 11 , wherein the computation comprises calculation of the metric. 13. The method of claim 12 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 14. The method of claim 1 , wherein: each record of the plurality of records comprises a customer profile; and the plurality of records comprises at least five hundred million records. 15. The method of claim 1 , wherein: at least one of the two or more probability estimation trees is programmed to perform record linkage; each record of the plurality of records comprises a customer profile; the at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household; the distinction corresponds to a metric characterizing a similarity between attributes of compared records of the plurality of records; the computation comprises calculation of the metric; the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio; the computer system provides a parallel computing environment; the computer system comprises a plurality of worker nodes; and the processing is conducted by the plurality of worker nodes. 16. A computer-implemented method for efficiently processing a large number of records, the method comprising: estimating, by a computer system, a probability that subject data is in a single class by: obtaining, by the computer system, a plurality of records, the plurality of records comprising subject data and each record of the plurality of records comprises a customer profile; obtaining, by the computer system, two or more probability estimation trees, each of the two or more probability estimation trees comprising multiple paths and at least one of the two or more probability estimation trees is programmed to identify records within the plurality of records that are likely to correspond to a common customer or household; and processing, by the computer system, the plurality of records through at least two of the two or more probability estimation trees, the processing in each of the at least two of the two or more probability estimation trees comprising: arriving of the subject data at a distinction node of the each of the at least two of the two or more probability estimation trees, the distinction node corresponding to a metric characterizing a similarity between character strings of compared records of the plurality of records; calculating, by the computer system after the arriving of the subject data at the distinction node of the each of the at least two of the two or more probability estimation trees, a value corresponding to the metric; making a distinction based on the value; selecting a path of the multiple paths based at least in part on the distinction; directing the subject data to another distinction node of the each of the at least two of the two or more probability estimation trees; and estimating a class for the subject data based at least in part on the path of the multiple paths selected; and combining, by the computer system, the class that is estimated of the at least two of the two or more probability estimation trees into the single class. 17. The method of claim 16 , wherein the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio. 18. The method of claim 16 , wherein the computer system provides a parallel computing environment. 19. The method of claim 18 , wherein the computer system comprises a plurality of worker nodes. 20. The method of claim 19 , wherein the processing is conducted by the plurality of worker nodes. 21. The method of claim 16 , wherein: the metric is selected from the group consisting of: a Levenshtein distance; a normalized Levenshtein distance; a trigram score; and a trigram ratio; the computing system provides a parallel computing environment; the computer system comprises a plurality of worker nodes; and the processing is conducted by the plurality of worker nodes. 22. A computer system comprising: one or more processing modules; and one
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.