Top-k query processing with conditional skips

US10459959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10459959-B2
Application numberUS-201615345277-A
CountryUS
Kind codeB2
Filing dateNov 7, 2016
Priority dateNov 7, 2016
Publication dateOct 29, 2019
Grant dateOct 29, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus for performing top-k query processing include pruning a list of documents to identify a subset of the list of documents, where pruning includes, for other query terms in the set of query terms, skipping a document in the list of documents based, at least in part, on the contribution of the query term to the score of the corresponding document and the term upper bound for each other query term, in the set of query terms, that matches the document.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining, by one or more servers, a set of query terms of a search query; identifying, by the one or more servers, a list of documents pertaining to the set of query terms, wherein each document in the list of documents is identified by a document identifier, wherein each query term in the set of query terms has associated therewith a posting list of elements representing documents including the query term, wherein each element in the posting list indicates the corresponding document identifier and a contribution of the query term to a score of the document, and wherein each query term in the set of query terms has a corresponding term upper bound on its potential contribution to at least one document in the list of documents; organizing the list of documents using a tree data structure, each of a plurality of nodes of the tree data structure representing a corresponding document heap of a plurality of document heaps; and pruning, by the one or more servers, the list of documents to identify a subset of the list of documents, wherein pruning includes, for each query term in the set of query terms, skipping a document in the list of documents based, at least in part, on the contribution of the query term to the score of the corresponding document and a term upper bound for each other query term, in the set of query terms, that matches the document, wherein pruning includes traversing the tree data structure. 2. The method of claim 1 , further comprising: for a query term in the set of query terms: for an element in the corresponding posting list, summing the contribution of the query term to the score of the corresponding document and a term upper bound for each other query term, in the set of query terms, that matches the document to ascertain a total; and skipping the element within the corresponding posting list according to whether the total is lower than a lowest document score of a set of top scored documents of the list of documents. 3. The method of claim 2 , further comprising: updating the set of top scored documents to include the document based, at least in part, on the total. 4. The method of claim 1 , further comprising: for a query term in the set of query terms: summing a term upper bound for each of the other query terms in the set of query terms to ascertain a total; and skipping an element within the corresponding posting list based, at least in part, on whether the total is lower than a lowest document score of a set of top scored documents of the list of documents. 5. The method of claim 1 , each of the document heaps corresponding to a different set of one or more documents in the list of documents. 6. The method of claim 1 , further comprising: ascertaining, for each posting list, for each element of the corresponding posting list, the contribution of the corresponding query term to the score of the document represented by the element. 7. The method of claim 1 , further comprising: for each posting list, sorting the corresponding elements according to the document identifier of each of the respective elements. 8. The method of claim 1 , further comprising: for each posting list, sorting the corresponding elements according to the contribution of the query term to the scores of the corresponding documents. 9. A system, comprising: one or more servers; the one or more servers including one or more processors and one or memories, the one or more servers being configured to: obtain a set of query terms of a search query; identify a list of documents pertaining to the set of query terms, wherein each document in the list of documents is identified by a document identifier, wherein each query term in the set of query terms has associated therewith a posting list of elements representing documents including the query term, wherein each element in the posting list indicates the corresponding document identifier and a contribution of the query term to a score of the document, and wherein each query term in the set of query terms has a corresponding term upper bound on its potential contribution to at least one document in the list of documents; organize the list of documents using a tree data structure such that each of a plurality of nodes of the tree data structure represents a corresponding document heap of a plurality of document heaps; and prune the list of documents to identify a subset of the list of documents, wherein pruning includes, for each query term in the set of query terms, skipping a document in the list of documents based, at least in part, on the contribution of the query term to the score of the corresponding document and a term upper bound for each other query term, in the set of query terms, that matches the document, wherein pruning includes traversing the tree data structure. 10. The system of claim 9 , the one or more servers being further configured to: for a query term in the set of query terms: for an element in the corresponding posting list, sum the contribution of the query term to the score of the corresponding document and a term upper bound for each other query term, in the set of query terms, that matches the document to ascertain a total; and skip the element within the corresponding posting list according to whether the total is lower than a lowest document score of a set of top scored documents of the list of documents. 11. The system of claim 10 , the one or more servers being further configured to: update the set of top scored documents to include the document based, at least in part, on the total. 12. The system of claim 9 , the one or more servers being further configured to: for a query term in the set of query terms: sum a term upper bound for each of the other query terms in the set of query terms to ascertain a total; and skip an element within the corresponding posting list based, at least in part, on whether the total is lower than a lowest document score of a set of top scored documents of the list of documents. 13. The system of claim 9 , each of the document heaps corresponding to a different set of one or more documents in the list of documents. 14. The system of claim 9 , the one or more servers being further configured to: ascertain, for each posting list, for each element of the corresponding posting list, the contribution of the corresponding query term to the score of the document represented by the element. 15. The system of claim 9 , the one or more servers being further configured to: for each posting list, sort the corresponding elements according to the document identifier of each of the respective elements. 16. The system of claim 9 , the one or more servers being further configured to: for each posting list, sort the corresponding elements according to the contribution of the query term to the scores of the corresponding documents. 17. A computer program product comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to: obtain a set of query terms of a search query; identify a list of documents pertaining to the set of query terms, wherein each document in the list of documents is identified by a document identifier, wherein each query term in the set of query terms has associated therewith a posting list of elements representing documents including the query te

Assignees

Inventors

Classifications

  • G06F16/334Primary

    Query execution (filtering based on additional data G06F16/335) · CPC title

  • Document management systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10459959B2 cover?
Methods and apparatus for performing top-k query processing include pruning a list of documents to identify a subset of the list of documents, where pruning includes, for other query terms in the set of query terms, skipping a document in the list of documents based, at least in part, on the contribution of the query term to the score of the corresponding document and the term upper bound for e…
Who is the assignee on this patent?
Oath Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/334. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 29 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).