Systems and methods for evaluating diversity of content based on content properties
US-2018025087-A1 · Jan 25, 2018 · US
US10896189B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10896189-B2 |
| Application number | US-201816481030-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2018 |
| Priority date | Aug 11, 2017 |
| Publication date | Jan 19, 2021 |
| Grant date | Jan 19, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An information entropy-based metric is used to represent a degree of diversity of a search result of genealogical records. In response to a query, a data query server locates a set of multiple records that match the query. The records are classified into different record types based on the records' attributes. One or more distributions of numbers of records classified into each record type are determined. Each distribution corresponds to one of the subsets the records. For each distribution, an entropy value is determined. A cumulative entropy that corresponds to a sum of the entropy values of those distributions is then determined. The cumulative entropy may serve as the entropy-based metric of the search result. The cumulative entropy may also be normalized by an ideal cumulative entropy. The normalized metric allows the diversity of different search results to be compared across different queries that may generate different numbers of records.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method, comprising: accessing a set of genealogical records based on a search query, each genealogical record comprising one or more attributes; ranking the set of genealogical records in a rank order; classifying the genealogical records into a plurality of record types based on the one or more attributes of the genealogical records; selecting, based on the rank order, a plurality of subsets from the set of genealogical records, each subset different from another subset, wherein the plurality of subsets comprise a first subset of genealogical records and a second subset that includes the first subset and at least an additional lower-ranked genealogical record that is ranked lower than any genealogical record in the first subset; determining a plurality of distributions of numbers of genealogical records that are classified into each of the plurality of record types, each of the plurality of distributions corresponding to one of the plurality of subsets; determining an entropy-based metric corresponding to the set of genealogical records, wherein the entropy-based metric represents a degree of diversity of the set of genealogical records in the rank order, wherein determining the entropy-based metric comprises: determining a plurality of entropy values of the plurality of subsets, each subset having an entropy value that is determined based on the distribution corresponding to the subset, wherein at least a first distribution corresponds to the genealogical records in the first subset and a second distribution corresponds to the genealogical records in the second subset that includes the first subset and at least the additional lower-ranked genealogical record; adding the plurality of entropy values together to determine a sum of the entropy values of the plurality of subsets; determining a cumulative entropy that corresponds to the sum of the entropy values of the plurality of subsets, the cumulative entropy being the entropy-based metric, wherein, for the cumulative entropy, the genealogical records in the first subset are weighted heavier than the additional lower-ranked genealogical record; and generating an indication of the degree of diversity of the set of genealogical records. 2. The computer-implemented method of claim 1 , wherein at least one of the entropy values of the plurality of subsets is determined based on: E ( Q ) = - ∑ i = 1 K p i log p i 3. The computer-implemented method of claim 1 , wherein determining the entropy-based metric further comprises: determining an ideal cumulative entropy; and determining a normalized cumulative entropy that is based on the cumulative entropy normalized by the ideal entropy, the normalized cumulative entropy being the entropy-based metric instead of the cumulative entropy. 4. The computer-implemented method of claim 3 , wherein the normalized cumulative entropy is normalized to a scale between 0 and 1 , and the computer-implemented method further comprises: comparing the normalized cumulative entropy to a threshold that is pre-set to be between 0 and 1; responsive to the normalized cumulative entropy being below the threshold, re-ranking the set of genealogical records. 5. The computer-implemented method of claim 3 , wherein determining the ideal cumulative entropy comprises: determining maximum entropies of each of the plurality of distributions, each distribution having a maximum entropy based on a number of genealogical records in the distribution and a number of record types in the distribution; and summing the maximum entropies. 6. The computer-implemented method of claim 1 , wherein the one or more attributes used to classify each of the genealogical records into one of the plurality of record types are data categories selected from one or more of: birth, marriage, death, residence, immigration, military, court, or directories. 7. The computer-implemented method of claim 1 , further comprising: comparing the entropy-based metric to a threshold; and responsive to the entropy-based metric being below the threshold, re-ranking the set of genealogical records. 8. The computer-implemented method of claim 7 , wherein a re-ranked set of genealogical records, which is re-ranked from an original set, has a value of entropy-based metric that is higher than the original set. 9. The computer-implement method of claim 1 , wherein determining the plurality of distributions comprises: selecting the subsets of genealogical records from the set of genealogical records based on a rank order of the set based on criteria of: (i) having two or more genealogical records in each subset, and (ii) the two or more genealogical records of the subset being within a threshold distance of each other by the rank order; determining a distribution for each of subsets by counting a number of records that are classified into each record type. 10. The computer-implemented method of claim 9 , wherein each of the subsets is smaller than the set. 11. The computer-implemented method of claim 10 , wherein each of the subsets has a different number of genealogical records. 12. A computer-implemented method, comprising: accessing a set of genealogical records that correspond to a rank order; determining an entropy value associated with each ranked position in the set of genealogical records, the entropy value associated with each ranked position corresponding to a distribution of a subset of genealogical records that are selected based on the ranked position, wherein at least a first distribution that corresponds to a first ranked position includes a first subset of genealogical records and a second distribution that corresponds to a second ranked position includes the first subset and at least an additional lower-ranked genealogical record; determining an entropy-based metric based on the entropy values of the ranked positions in the set of genealogical records, wherein determining the entropy-based metric comprises: adding the entropy values of the ranked positions in the set of genealogical records to determine a sum of the entropy values associated with the ranked positions; determining a cumulative entropy that corresponds to the sum of the entropy values associated with the ranked positions, the cumulative entropy being the entropy-based metric, wherein, for the cumulative entropy, the genealogical records in the first subset are weighted heavier than the additional lower-ranked genealogical record; responsive to the entropy-based metric being lower than a threshold of a predetermined value of cumulative entropy, re-determining the rank order;
using probabilistic model · CPC title
using ranking · CPC title
Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title
in federated or virtual databases · CPC title
Approximate or statistical queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.