Iterative query expansion for document discovery

US11720554B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720554-B2
Application numberUS-202117142491-A
CountryUS
Kind codeB2
Filing dateJan 6, 2021
Priority dateJan 6, 2021
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment for expanding a search query is provided. The embodiment may include receiving a stopping criterion for stopping a search. The embodiment may also include receiving an initial search query. The embodiment may further include submitting the initial search query to an information retrieval system. The embodiment may also include identifying enrichment terms from the retrieved initial set of documents. The embodiment may further include generating a subsequent search query that includes one or more enrichment terms from the retrieved initial set of documents. The embodiment may also include submitting the subsequent search query to the information retrieval system. The embodiment may further include determining whether the stopping criterion is met, and in response to determining the stopping criterion is not met, iterating identifying, generating, submitting steps until the stopping criterion is met.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-based method of expanding a search query, the method comprising: receiving an initial search query and a stopping criterion for stopping a search; submitting the initial search query to an information retrieval system, the information retrieval system retrieving an initial set of documents responsive to the initial search query; identifying enrichment terms from the retrieved initial set of documents; generating a subsequent search query that includes one or more enrichment terms from the retrieved initial set of documents; submitting the subsequent search query to the information retrieval system, the information retrieval system retrieving a subsequent set of documents responsive to the subsequent search query; determining whether the stopping criterion is met, wherein the stopping criterion is the number of identified enrichment terms from the retrieved initial set of documents, wherein the number of identified enrichment terms from the retrieved initial set of documents is equivalent to the number of iterations, and wherein each search query contains search terms of the initial search query and one enrichment term; and in response to determining the stopping criterion is not met, iterating, until the stopping criterion is met: identifying updated enrichment terms from the set of documents retrieved in a most recent previous search query; generating a new search query that includes enrichment terms from a most recent previous set of documents; and submitting the new search query to the information retrieval system to retrieve another set of documents responsive to the new search query. 2. The method of claim 1 , further comprising: in response to determining the stopping criterion is met, identifying high-leverage query terms in the new search query. 3. The method of claim 2 , wherein the new search query has one or more of the following characteristics: a set of enrichment terms different from the enrichment terms from the most recent previous set of documents; and a set of enrichment terms ranked by an enrichment relevance score. 4. The method of claim 1 , wherein the stopping criterion is selected from a group consisting of manually set by a user, and the identified updated enrichment terms from the most recent previous search query match the enrichment terms of the new search query. 5. The method of claim 1 , wherein each iteration of the subsequent search query is one or more cumulatively expanded search queries of the subsequent search query. 6. The method of claim 5 , wherein each iteration of the one or more cumulatively expanded search queries further comprises: identifying a fixed sequence length of a plurality of cumulatively expanded search queries; executing a consecutive sequence of the plurality of cumulatively expanded search queries; identifying a number of additional documents retrieved from each subsequent cumulatively expanded search query; determining whether the retrieved additional documents and the retrieved most recent previous set of documents converge; and in response to determining the retrieved additional documents and the most recent previous set of documents do not converge, discontinuing the cumulative expansion of the one or more cumulatively expanded search queries. 7. The method of claim 1 , wherein the enrichment terms are selected from a group consisting of discovered entities, keywords, concepts, and topics. 8. A computer system, the computer system comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage medium, and program instructions stored on at least one of the one or more tangible storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: receiving an initial search query and a stopping criterion for stopping a search; submitting the initial search query to an information retrieval system, the information retrieval system retrieving an initial set of documents responsive to the initial search query; identifying enrichment terms from the retrieved initial set of documents; generating a subsequent search query that includes one or more enrichment terms from the retrieved initial set of documents; submitting the subsequent search query to the information retrieval system, the information retrieval system retrieving a subsequent set of documents responsive to the subsequent search query; determining whether the stopping criterion is met, wherein the stopping criterion is the number of identified enrichment terms from the retrieved initial set of documents, wherein the number of identified enrichment terms from the retrieved initial set of documents is equivalent to the number of iterations, and wherein each search query contains search terms of the initial search query and one enrichment term; and in response to determining the stopping criterion is not met, iterating, until the stopping criterion is met: identifying updated enrichment terms from the set of documents retrieved in a most recent previous search query; generating a new search query that includes enrichment terms from a most recent previous set of documents; and submitting the new search query to the information retrieval system to retrieve another set of documents responsive to the new search query. 9. The computer system of claim 8 , further comprising: in response to determining the stopping criterion is met, identifying high-leverage query terms in the new search query. 10. The computer system of claim 9 , wherein the new search query has one or more of the following characteristics: a set of enrichment terms different from the enrichment terms from the most recent previous set of documents; and a set of enrichment terms ranked by an enrichment relevance score. 11. The computer system of claim 8 , wherein the stopping criterion is selected from a group consisting of manually set by a user, and the identified updated enrichment terms from the most recent previous search query match the enrichment terms of the new search query. 12. The computer system of claim 8 , wherein each iteration of the subsequent search query is one or more cumulatively expanded search queries of the subsequent search query. 13. The computer system of claim 12 , wherein each iteration of the one or more cumulatively expanded search queries further comprises: identifying a fixed sequence length of a plurality of cumulatively expanded search queries; executing a consecutive sequence of the plurality of cumulatively expanded search queries; identifying a number of additional documents retrieved from each subsequent cumulatively expanded search query; determining whether the retrieved additional documents and the retrieved most recent previous set of documents converge; and in response to determining the retrieved additional documents and the most recent previous set of documents do not converge, discontinuing the cumulative expansion of the one or more cumulatively expanded search queries. 14. The computer system of claim 8 , wherein the enrichment terms are selected from a group consisting of discovered entities, keywords, concepts, and topics. 15. A computer program product, the computer program product comprising: one or more computer-readable storage medium and program instructions stored on at least one of the one or more storage medium, the program instructions executable by a processor capable of performing a method, the method comprising: receiv

Assignees

Inventors

Classifications

  • Iterative querying; Query formulation based on the results of a preceding query · CPC title

  • using ranking · CPC title

  • Document management systems · CPC title

  • Reformulation based on results of preceding query · CPC title

  • Selection or weighting of terms from queries, including natural language queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720554B2 cover?
An embodiment for expanding a search query is provided. The embodiment may include receiving a stopping criterion for stopping a search. The embodiment may also include receiving an initial search query. The embodiment may further include submitting the initial search query to an information retrieval system. The embodiment may also include identifying enrichment terms from the retrieved initia…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2425. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).