Automated assistance for generating relevant and valuable search results for an entity of interest

US11714869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11714869-B2
Application numberUS-202117564056-A
CountryUS
Kind codeB2
Filing dateDec 28, 2021
Priority dateMay 2, 2017
Publication dateAug 1, 2023
Grant dateAug 1, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for identifying relevant information for an entity comprising: one or more processors; and a memory storing instructions that, when executed by the one or more processors, cause the system to: generate a plurality of search queries comprising a seed entity and one or more entities associated with the seed entity, the generation comprising: determining a second entity validated to be linked to the seed entity, the second entity and the seed entity forming a seed cluster; identifying properties associated with the second entity and the seed entity; generating a search query that is associated with a subset of the identified properties; determining that the seed entity is associated with a third entity; and in response to the determination that the seed entity is associated with the third entity: determining degrees of difference between:  a first link between the seed entity and the second entity; and  a second link between the third entity and a fourth entity validated to be linked to the third entity; determining a probability of a match between one or more types of the identified properties and a particular backend datasource against which the search query is run, selected from different backend datasources; and creating a second search query based on the determined degrees of difference and the determined probabiltiy of the match. 2. The system of claim 1 , wherein the instructions further cause the system to: determine a frequency at which the third entity appears across one or more backend datasources; and wherein the creating of the second search query is further based on the frequency. 3. The system of claim 2 , wherein the creating of the second search query comprises: selcecting a highest-scoring query, wherein a score of the highest-sciring query is determined based on the degrees of difference, the determined probability of a match, and the frequency; and in response to selecting a highest-scoring query, selecting a next highest-scoring query. 4. The system of claim 1 , wherein the instructions further cause the system to: determine a second degree of difference between: the second entity or the seed entity; and the third entity; and wherein the creating of the second search query is based on the second degree of difference. 5. The system of claim 1 , wherein the instructions further cause the system to: conduct the second search query; determine probabilities that respective results of the second search query are spurious based on a number of the results; determine whether to discard a subset of the results based on the determined probabilities; and selectively discard the subset of the results based on the determination of whether to discard the subset. 6. The system of claim 1 , wherein the first link indicates a first relationship between the seed entity and the second entity and the second link indicates a second relationship between the third entity and the fourth entity. 7. The system of claim 1 , wherein the second search query corresponds to the third entity. 8. The system of claim 1 , wherein the instructions, when executed, further cause the system to: create a third search query based on a misspelling of the third entity. 9. The system of claim 1 , wherein the seed entity comprises a pseudonym. 10. The system of claim 1 , wherein the instructions further cause the system to: determine second degrees of difference between: the seed entity and the second entity; and the third entity and the fourth entity; and wherein the second search query is created based on the determined second degrees of difference. 11. The method of claim 1 , further comprising determining a frequency at which the third entity appears across one or more backend datasources; and wherein the creating of the second search query is further based on the frequency. 12. A computer-implemented method comprising: generating a plurality of search queries comprising a seed entity and one or more entities associated with the seed entity, the generation comprising: determining a second entity validated to be linked to the seed entity, the second entity and the seed entity forming a seed cluster; identifying properties associated with the second entity and the seed entity; generating a search query that is associated with a subset of the identified properties; determining that the seed entity is associated with a third entity; and in response to the determination that the seed entity is associated with the third entity: determining degrees of difference between: a first link between the seed entity and the second entity; and a second link between the third entity and a fourth entity validated to be linked to the third entity; determining a probability of a match between one or more types of the identified properties and a particular backend datasource against which the search query is run, selected from different backend datasources; and creating a second search query based on the determined degrees of difference and the determined probabiltiy of the match. 13. The method of claim 12 , further comprising determining a second degree of difference between: the second entity or the seed entity; and the third entity; and wherein the creating of the second search query is based on the second degree of difference. 14. The method of claim 12 , further comprising: conducting the second search query; determining probabilities that respective results of the second search query are spurious based on a number of the results; determining whether to discard a subset of the results based on the determined probabilities; and selectively discarding the subset of the results based on the determination of whether to discard the subset. 15. The method of claim 12 , wherein the first link indicates a first relationship between the seed entity and the second entity and the second link indicates a second relationship between the third entity and the fourth entity. 16. The method of claim 12 , wherein the second search query corresponds to the third entity. 17. The method of claim 12 , further comprising creating a third search query based on a misspelling of the third entity. 18. The method of claim 12 , further comprising: determining second degrees of difference between: the seed entity and the second entity; and the third entity and the fourth entity; and wherein the second search query is created based on the determined second degrees of difference. 19. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors to perform: generating a plurality of search queries comprising a seed entity and one or more entities associated with the seed entity, the generation comprising: determining a second entity validated to be linked to the seed entity, the second entity and the seed entity forming a seed cluster; identifying properties associated with the second entity and the seed entity; generating a search query that is associated with a subset of the identified properties; determining that the seed entity is associated with a third entity; and in response to the determination that the seed entity is associated with the third entity: determining degrees of difference between: a first link between the seed entity and the second entity; and a second link between the third entity and a fourth entity validated to be linked to the third entity; determining a probability of a match between one or m

Assignees

Inventors

Classifications

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • G06F16/38Primary

    Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • Presentation of query results · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11714869B2 cover?
Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity an…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).