Diversity evaluation in genealogy search

US2019391975A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019391975-A1
Application numberUS-201816481030-A
CountryUS
Kind codeA1
Filing dateAug 10, 2018
Priority dateAug 11, 2017
Publication dateDec 26, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An information entropy-based metric is used to represent a degree of diversity of a search result of genealogical records. In response to a query, a data query server locates a set of multiple records that match the query. The records are classified into different record types based on the records' attributes. One or more distributions of numbers of records classified into each record type are determined. Each distribution corresponds to one of the subsets the records. For each distribution, an entropy value is determined. A cumulative entropy that corresponds to a sum of the entropy values of those distributions is then determined. The cumulative entropy may serve as the entropy-based metric of the search result. The cumulative entropy may also be normalized by an ideal cumulative entropy. The normalized metric allows the diversity of different search results to be compared across different queries that may generate different numbers of records.

First claim

Opening claim text (preview).

1 . A computer-implemented method, comprising: accessing a set of genealogical records based on a search query, each genealogical record comprising one or more attributes; ranking the set of genealogical records in a rank order; classifying the genealogical records into a plurality of record types based on the one or more attributes of the genealogical records; selecting one or more subsets from the set of genealogical records based on the rank order; determining one or more distributions of numbers of genealogical records that are classified into each of the plurality of record types, each of the one or more distributions corresponding to one of the one or more subsets; and determining an entropy-based metric based on an entropy value of each of the one or more distributions, wherein the entropy-based metric represents a degree of diversity of the set of genealogical records in the rank order, wherein determining the entropy-based metric comprises: determining the entropy values of the one or more distributions, each distribution having an entropy value that is determined based on the numbers of genealogical records that are classified into each of the plurality of the record types of the distribution; and determining a cumulative entropy that corresponds to a sum of the determined entropy values of the one or more distributions, the cumulative entropy being the entropy-based metric. 2 . (canceled) 3 . The computer-implemented method of claim 1 , wherein the entropy values of the one or more distributions are each determined based on: E  ( Q ) = - ∑ i = 1 K   p i  log   p i 4 . The computer-implemented method of claim 1 , wherein determining the entropy-based metric further comprises: determining an ideal cumulative entropy; and determining a normalized cumulative entropy that is based on the cumulative entropy normalized by the ideal entropy, the normalized cumulative entropy being the entropy-based metric instead of the cumulative entropy. 5 . The computer-implemented method of claim 4 , wherein the normalized cumulative entropy is normalized to a scale between 0 and 1, and the computer-implemented method further comprises: comparing the normalized cumulative entropy to a threshold that is pre-set to be between 0 and 1; responsive to the normalized cumulative entropy being below the threshold, re-ranking the set of genealogical records. 6 . (canceled) 7 . (canceled) 8 . (canceled) 9 . The computer-implemented method of claim 4 , wherein determining the ideal cumulative entropy comprises: determining maximum entropies of the one or more distributions, each distribution having a maximum entropy based on a number of genealogical records in the distribution and a number of record types in the distribution; and summing the maximum entropies. 10 . (canceled) 11 . The computer-implemented method of claim 1 , wherein the one or more attributes used to classify each of the genealogical records into one of the plurality of record types are data categories selected from the group consisting of: birth, marriage, death, residence, immigration, military, court, and directories. 12 . (canceled) 13 . The computer-implemented method of claim 1 , further comprising: comparing the entropy-based metric to a threshold; and responsive to the entropy-based metric being below the threshold, re-ranking the set of genealogical records. 14 . The computer-implemented method of claim 13 , wherein a re-ranked set of genealogical records, which is re-ranked from an original set, has a value of entropy-based metric that is higher than the original set. 15 . The computer-implemented method of claim 1 , wherein determining the one or more distributions comprises: selecting the subsets of genealogical records from the set of genealogical records based on a rank order of the set based on criteria of: (i) having two or more genealogical records in each subset, and (ii) the two or more genealogical records of the subset being within a threshold distance of each other by the rank order; determining a distribution for each of subsets by counting a number of records that are classified into each record type. 16 . The computer-implemented method of claim 15 , wherein each of the subsets is smaller than the set. 17 . The computer-implemented method of claim 16 wherein each of the subsets has different numbers of genealogical records. 18 . The computer-implemented method of claim 17 , wherein a latter subset from the subsets selected includes one additional genealogical record than a previous subset, the one additional genealogical record being a record immediately succeeding a last record of the previous subset in the rank order. 19 . A computer-implemented method, comprising: accessing a set of genealogical records that correspond to a rank order; determining an entropy value associated with each ranked position in the set of genealogical records, the entropy value associated with each ranked position corresponding to a distribution of a subset of genealogical records that are selected based on the ranked position; determining an entropy-based metric based on the entropy values of the ranked positions in the set of genealogical records, wherein determining the entropy-based metric comprises: determining a cumulative entropy that corresponds to a sum of the determined entropy values associated with the ranked positions, the cumulative entropy being the entropy-based metric; and responsive to the entropy-based metric being lower than a threshold, re-determining the rank order. 20 . (canceled) 21 . The computer-implemented method of claim 19 , wherein the subset of genealogical records associated with a ranked position comprises genealogical records that precede the ranked position. 22 . The computer-implemented method of claim 19 , wherein each of the subset associated with each ranked position has a different number of records. 23 . (canceled) 24 . The computer-implemented method of claim 19 , wherein a latter subset associated with a latter ranked position has one additional genealogical record than a previous subset associated with a previous ranked position immediately preceding the latter ranked position. 25 . The computer-implemented method of claim 24 , wherein the one additional genealogical record is a record immediately succeeding a last record of the previous subse

Assignees

Inventors

Classifications

  • Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title

  • Threshold · CPC title

  • Clustering or classification · CPC title

  • using ranking · CPC title

  • Approximate or statistical queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019391975A1 cover?
An information entropy-based metric is used to represent a degree of diversity of a search result of genealogical records. In response to a query, a data query server locates a set of multiple records that match the query. The records are classified into different record types based on the records' attributes. One or more distributions of numbers of records classified into each record type are …
Who is the assignee on this patent?
Ancestry Com Dna Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2462. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).