Ranking collections of document passages associated with an entity name by relevance to a query

US11226972B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11226972-B2
Application numberUS-201916278856-A
CountryUS
Kind codeB2
Filing dateFeb 19, 2019
Priority dateFeb 19, 2019
Publication dateJan 18, 2022
Grant dateJan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Query service receives a query comprising at least a name component. The query service searches a document corpus to identify multiple passages, each comprising a mention of the name component within a selection of one or more documents of the document corpus. The query service collects bins, each bin comprising a distinct selection of the passages from the one or more documents, each of the bins identifying a separate relationship the name component participates in within the distinct selection of passages. The query service assesses a separate score of each respective bin reflecting the relevance of each respective bin to the query. The query service returns a response to the query with the bins each ranked according to each separate score.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by a computer system, a query comprising at least a name component and one or more identifier components identifying one or more particular entities associated with the name component; searching, by the computer system, a plurality of documents in a document corpus to identify a plurality of passages each comprising a mention of the name component within a selection of one or more documents of the plurality of documents; collecting, by the computer system, a plurality of bins each comprising a distinct selection of the plurality of passages from the one or more documents, each of the plurality of bins identifying a separate relationship the name component and the one or more identifier components participate in within the distinct selection of the plurality of passages; extracting, by the computer system, from each of the one or more identifier components in each respective bin, an identifier string and a desired relationship; calculating, by the computer system, one or more match values for each of the one or more identifier components in each respective bin, the one or more match values based, at least in part, on: (i) the desired relationship, and (ii) a closeness of the identifier string to the mention; assessing, by the computer system, a separate score of each respective bin of the plurality of bins reflecting the relevance of each respective bin to the query, the separate score based, at least in part, on the calculated one or more match values for the one or more identifier components in the respective bin; and returning, by the computer system, a response to the query with the plurality of bins each ranked according to each separate score. 2. The method of claim 1 , wherein the one or more match values are further based, at least in part, on whether a fuzzy match to the identifier string is acceptable. 3. The method of claim 1 , wherein the one or more match values are further based, at least in part, on one or more identifier ranking characteristics. 4. The method of claim 3 , further comprising: evaluating, by the computer system, a ranking characteristic of the one or more identifier ranking characteristics such that an extracted relationship of the identifier string to the mention that matches the desired relationship scores higher than an occurrence of the identifier string to the mention that does not match the desired relationship. 5. The method of claim 3 , further comprising: evaluating, by the computer system, a ranking characteristic of the one or more identifier ranking characteristics such that an occurrence of the identifier string closer to the mention scores higher than the occurrence of the identifier string farther from the mention. 6. The method of claim 5 , wherein evaluating the ranking characteristic of the one or more ranking characteristics comprises: evaluating, by the computer system, a nearness of the occurrence of the identifier string to the mention based on a nearness percentage of the identifier string to the mention in view of a maximum number of words set to a maxwords value. 7. The method of claim 1 , wherein: the query further comprises one or more association components each for indicating or counter-indicating a particular entity associated with the name component; and the separate score is further based, at least in part, on one or more additional match values calculated for each of the one or more identifier components in each respective bin based on one or more association ranking characteristics. 8. The method of claim 7 , further comprising: calculating, by the computer system, the one or more additional match values for each of the one or more identifier components in each respective bin based on the one or more association ranking characteristics by extracting an association string and a desired relationship from each of the one or more association components; evaluating, by the computer system, a first ranking characteristic of one or more association ranking characteristics such that an extracted relationship of the association string to the mention that matches the desired relationship scores higher than an occurrence of the association string to the mention that does not match the desired relationship; and evaluating, by the computer system, a second ranking characteristic of one or more association ranking characteristics such that a higher relative frequency of the association string in one bin of the plurality of bins scores higher than a lower relative frequency of the association string in another bin of the plurality of bins. 9. A computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices, and program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising: program instructions to receive a query comprising at least a name component and one or more identifier components identifying one or more particular entities associated with the name component; program instructions to search a plurality of documents in a document corpus to identify a plurality of passages each comprising a mention of the name component within a selection of one or more documents of the plurality of documents; program instructions to collect a plurality of bins each comprising a distinct selection of the plurality of passages from the one or more documents, each of the plurality of bins identifying a separate relationship the name component and the one or more identifier components participate in within the distinct selection of the plurality of passages; program instructions to extract, from each of the one or more identifier components in each respective bin, an identifier string and a desired relationship; program instructions to calculate one or more match values for each of the one or more identifier components in each respective bin, the one or more match values based, at least in part, on: (i) the desired relationship, and (ii) a closeness of the identifier string to the mention; program instructions to assess a separate score of each respective bin of the plurality of bins reflecting the relevance of each respective bin to the query, the separate score based, at least in part, on the calculated one or more match values for the one or more identifier components in the respective bin; and program instructions to return a response to the query with the plurality of bins each ranked according to each separate score. 10. The computer system of claim 9 , wherein the one or more match values are further based, at least in part, on whether a fuzzy match to the identifier string is acceptable. 11. The computer system of claim 9 , wherein the one or more match values are further based, at least in part, on one or more identifier ranking characteristics. 12. The computer system of claim 11 , the stored program instructions further comprising: program instructions to evaluate a ranking characteristic of the one or more identifier ranking characteristics such that an extracted relationship of the identifier string to the mention that matches the desired relationship scores higher than an occurrence of the identifier string to the mention that does not match the desired relationship. 13. The computer system of claim 11 , the stored program instructions further comprising: program instructions to evaluate a ranking characteristic of the one or more identifier ranking characteristics such that an occurrence of the iden

Assignees

Inventors

Classifications

  • using ranking · CPC title

  • G06F16/334Primary

    Query execution (filtering based on additional data G06F16/335) · CPC title

  • Fuzzy queries · CPC title

  • Document management systems · CPC title

  • Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11226972B2 cover?
Query service receives a query comprising at least a name component. The query service searches a document corpus to identify multiple passages, each comprising a mention of the name component within a selection of one or more documents of the document corpus. The query service collects bins, each bin comprising a distinct selection of the passages from the one or more documents, each of the bi…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/24578. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).