Systems and methods for evaluating diversity of content based on content properties
US-2018025087-A1 · Jan 25, 2018 · US
US2019391975A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019391975-A1 |
| Application number | US-201816481030-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 10, 2018 |
| Priority date | Aug 11, 2017 |
| Publication date | Dec 26, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An information entropy-based metric is used to represent a degree of diversity of a search result of genealogical records. In response to a query, a data query server locates a set of multiple records that match the query. The records are classified into different record types based on the records' attributes. One or more distributions of numbers of records classified into each record type are determined. Each distribution corresponds to one of the subsets the records. For each distribution, an entropy value is determined. A cumulative entropy that corresponds to a sum of the entropy values of those distributions is then determined. The cumulative entropy may serve as the entropy-based metric of the search result. The cumulative entropy may also be normalized by an ideal cumulative entropy. The normalized metric allows the diversity of different search results to be compared across different queries that may generate different numbers of records.
Opening claim text (preview).
1 . A computer-implemented method, comprising: accessing a set of genealogical records based on a search query, each genealogical record comprising one or more attributes; ranking the set of genealogical records in a rank order; classifying the genealogical records into a plurality of record types based on the one or more attributes of the genealogical records; selecting one or more subsets from the set of genealogical records based on the rank order; determining one or more distributions of numbers of genealogical records that are classified into each of the plurality of record types, each of the one or more distributions corresponding to one of the one or more subsets; and determining an entropy-based metric based on an entropy value of each of the one or more distributions, wherein the entropy-based metric represents a degree of diversity of the set of genealogical records in the rank order, wherein determining the entropy-based metric comprises: determining the entropy values of the one or more distributions, each distribution having an entropy value that is determined based on the numbers of genealogical records that are classified into each of the plurality of the record types of the distribution; and determining a cumulative entropy that corresponds to a sum of the determined entropy values of the one or more distributions, the cumulative entropy being the entropy-based metric. 2 . (canceled) 3 . The computer-implemented method of claim 1 , wherein the entropy values of the one or more distributions are each determined based on: E ( Q ) = - ∑ i = 1 K p i log p i 4 . The computer-implemented method of claim 1 , wherein determining the entropy-based metric further comprises: determining an ideal cumulative entropy; and determining a normalized cumulative entropy that is based on the cumulative entropy normalized by the ideal entropy, the normalized cumulative entropy being the entropy-based metric instead of the cumulative entropy. 5 . The computer-implemented method of claim 4 , wherein the normalized cumulative entropy is normalized to a scale between 0 and 1, and the computer-implemented method further comprises: comparing the normalized cumulative entropy to a threshold that is pre-set to be between 0 and 1; responsive to the normalized cumulative entropy being below the threshold, re-ranking the set of genealogical records. 6 . (canceled) 7 . (canceled) 8 . (canceled) 9 . The computer-implemented method of claim 4 , wherein determining the ideal cumulative entropy comprises: determining maximum entropies of the one or more distributions, each distribution having a maximum entropy based on a number of genealogical records in the distribution and a number of record types in the distribution; and summing the maximum entropies. 10 . (canceled) 11 . The computer-implemented method of claim 1 , wherein the one or more attributes used to classify each of the genealogical records into one of the plurality of record types are data categories selected from the group consisting of: birth, marriage, death, residence, immigration, military, court, and directories. 12 . (canceled) 13 . The computer-implemented method of claim 1 , further comprising: comparing the entropy-based metric to a threshold; and responsive to the entropy-based metric being below the threshold, re-ranking the set of genealogical records. 14 . The computer-implemented method of claim 13 , wherein a re-ranked set of genealogical records, which is re-ranked from an original set, has a value of entropy-based metric that is higher than the original set. 15 . The computer-implemented method of claim 1 , wherein determining the one or more distributions comprises: selecting the subsets of genealogical records from the set of genealogical records based on a rank order of the set based on criteria of: (i) having two or more genealogical records in each subset, and (ii) the two or more genealogical records of the subset being within a threshold distance of each other by the rank order; determining a distribution for each of subsets by counting a number of records that are classified into each record type. 16 . The computer-implemented method of claim 15 , wherein each of the subsets is smaller than the set. 17 . The computer-implemented method of claim 16 wherein each of the subsets has different numbers of genealogical records. 18 . The computer-implemented method of claim 17 , wherein a latter subset from the subsets selected includes one additional genealogical record than a previous subset, the one additional genealogical record being a record immediately succeeding a last record of the previous subset in the rank order. 19 . A computer-implemented method, comprising: accessing a set of genealogical records that correspond to a rank order; determining an entropy value associated with each ranked position in the set of genealogical records, the entropy value associated with each ranked position corresponding to a distribution of a subset of genealogical records that are selected based on the ranked position; determining an entropy-based metric based on the entropy values of the ranked positions in the set of genealogical records, wherein determining the entropy-based metric comprises: determining a cumulative entropy that corresponds to a sum of the determined entropy values associated with the ranked positions, the cumulative entropy being the entropy-based metric; and responsive to the entropy-based metric being lower than a threshold, re-determining the rank order. 20 . (canceled) 21 . The computer-implemented method of claim 19 , wherein the subset of genealogical records associated with a ranked position comprises genealogical records that precede the ranked position. 22 . The computer-implemented method of claim 19 , wherein each of the subset associated with each ranked position has a different number of records. 23 . (canceled) 24 . The computer-implemented method of claim 19 , wherein a latter subset associated with a latter ranked position has one additional genealogical record than a previous subset associated with a previous ranked position immediately preceding the latter ranked position. 25 . The computer-implemented method of claim 24 , wherein the one additional genealogical record is a record immediately succeeding a last record of the previous subse
Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title
Threshold · CPC title
Clustering or classification · CPC title
using ranking · CPC title
Approximate or statistical queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.