Using historical information to improve search across heterogeneous indices

US8996561B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8996561-B2
Application numberUS-53533009-A
CountryUS
Kind codeB2
Filing dateAug 4, 2009
Priority dateAug 4, 2009
Publication dateMar 31, 2015
Grant dateMar 31, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating, a subset of the entities is formed. The query and this subset of entities are sent to a search engine to search the subset of entities to answer the query. In one embodiment, the estimating includes collecting statistical information from queries to build up a historical cache using heuristics or machine learning techniques, wherein the query includes a key word and a scope, and the historical cache contains a maximum number of returned results for an entity given the queries executed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of searching for data, comprising: identifying a query and a search scope including a set of specified entities, each of the entities including one or more documents; storing in a historical cache results from previous searches through each of the specified entities including for each of the specified entities, storing in the historical cache a number that is the largest number of the documents identified in the each entity during any of the previous searches through the each entity; for each of said specified entities, estimating a number of the documents included in said each entity that would be identified in a search through said each entity to answer said query, including using said number, from the historical cache, that is the largest number of the documents identified in the each entity during the any of the previous searches through the each entity, as an estimated number of return documents included in said each entity; forming a subset of said entities based on the estimated number of return documents included in each of the entities; and sending said query and said subset of said entities to a search engine to search said subset of said entities to answer said query. 2. The method according to claim 1 , wherein: the forming includes rewriting said query based on the historical cache; and the sending said query and said subset of entities to the search engine includes executing the identified query, and updating the historical cache based on the results of executing the identified query. 3. The method according to claim 1 , wherein the total of the estimated number of documents for all of the entities in the subset of entities is not more than a given number. 4. The method according to claim 1 , wherein the forming includes: arranging the specified entities in a defined order; and forming the subset of the entities based on said defined order. 5. The method according to claim 4 , wherein: the arranging includes arranging the specified entities in order based on said number, from the historical cache, that is the largest number of the documents identified in each of the specified entities during any of the previous searches through the each entity; and the forming includes adding the specified entities to the subset, in ascending order of said number, from the historical cache, that is the largest number of the documents identified in each of the specified entities during any of the previous searches through the each entity, as long as a specified criteria is met. 6. The method according to claim 5 , wherein the specified criteria is that the total of the estimated number of documents that would be returned for the entities in the subset cannot exceed a given number. 7. The method according to claim 1 , wherein the forming includes forming the subset of entities with at least a given minimum number of the specified entities. 8. The method according to claim 1 , further comprising forming additional subsets of the specified entities, and sending said query and said additional subsets of the entities to the search engine until all of the specified entities have been included in one of the subsets of entities and sent to said search engine. 9. The method according to claim 1 , further comprising: receiving a first group of documents from the search engine, said first group of documents linking to a group of one or more of the specified entities of the subset of the specified entities; comparing the difference between the number of entities in said subset of entities sent to said search engine and the number of entities in said group of linked entities to a given threshold number; when said difference is not less than the given threshold number, receiving a second group of documents from the search engine; and when said difference is less than the threshold number, forming another subset of the entities and sending said query and said another subset of the entities to the search engine. 10. A system for searching for data, comprising one or more processing units configured for: receiving a query and a search scope including a set of specified entities, each of the entities including one or more documents; storing in a historical cache results from previous searches through each of the specified entities, including for each of the specified entities, storing in the historical cache a number that is the largest number of the documents identified in the each entity during any of the previous searches through the each entity; for each of said specified entities, estimating a number of the documents included in said each entity that would be identified in a search through said each entity to answer said query, including using said number, from the historical cache, that is the largest number of the documents identified in the each entity during the any of the previous searches through the each entity, as an estimated number of return documents included in said each entity; forming a subset of said entities based on the estimated number of return documents included in each of the entities; and sending said query and said subset of said entities to a search engine to search said subset of said entities to answer said query. 11. The system according to claim 10 , wherein the total of the estimated number of documents for all of the entities in the subset of entities is not more than a given number. 12. The system according to claim 10 , wherein the forming includes: arranging the specified entities in a defined order; and forming the subset of the entities based on said defined order. 13. The system according to claim 10 , wherein said one or more processing units are further configured for forming additional subsets of the specified entities, and sending said query and said additional subsets of the entities to the search engine until all of the specified entities have been included in one of the subsets of entities and sent to said search engine. 14. An article of manufacture comprising: at least one computer usable device having computer readable program code logic tangibly embodied therein to execute instructions in a processing unit for searching for data, said computer readable program code logic, when executing, performing the following: receiving a query and a search scope including a set of specified entities, each of the entities including one or more documents; storing in a historical cache results from previous searches through each of the specified entities, including for each of the specified entities, storing in the historical cache a number that is the largest number of the documents identified in the each entity during any of the previous searches through the each entity; for each of said specified entities, estimating a number of the documents included in said each entity that would be identified in a search through said each entity to answer said query, including using said number, from the historical cache, that is the largest number of the documents identified in the each entity during the any of the previous searches through the each entity, as an estimated number of return documents included in said each entity; forming a subset of said entities based on the estimated number of return documents included in each of the entities; and sending said query and said subset of said entities to a search engine to search said subset of said entities to answer said query. 15. The article of manufacture according to claim 14 , wherein the forming includes: arranging the specified entities in a defined order; and forming the subset of the enti

Assignees

Inventors

Classifications

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title

  • Access plan code generation and invalidation; Reuse of access plans · CPC title

  • Details of hyperlinks; Management of linked annotations · CPC title

  • of access to content, e.g. by caching · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8996561B2 cover?
A method, system and computer program product are disclosed for searching for data. In one embodiment, the invention provides a method comprising identifying a query and a search scope including a set of specified entities; and for each of these entities, estimating a number of documents that would be identified in a search through the entity to answer the query. On the basis of this estimating…
Who is the assignee on this patent?
Deng Yu, Devarakonda Murthy V, Hosn Rafah A, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).