System and method for top-k searching using parallel processing

US11734285B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11734285-B2
Application numberUS-201815928723-A
CountryUS
Kind codeB2
Filing dateMar 22, 2018
Priority dateMar 22, 2018
Publication dateAug 22, 2023
Grant dateAug 22, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and programming for retrieving content items for a search are described herein. In a non-limiting embodiment, a query including a plurality of terms may be received. For each of the plurality of terms, a posting list of one or more content items may be obtained. The posting list may include a ranked list of term scores corresponding to the one or more content items, each of the term scores being indicative of a level of relevance of a corresponding content item to a term associated with the posting list. A list of relevant content items for the query may be determined based on the term scores in each posting list for the one or more content items identified with respect to each term. At least one of the relevant content items may be provided as a response to the query.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for retrieving documents for a search, the method being implemented on a computing device comprising a plurality of processors, memory, and a communication platform connected to a network, the method comprising: receiving a query comprising a plurality of terms; obtaining, for each of the plurality of terms, a posting list of one or more content items ranked based on term scores associated with a corresponding one or more content items, wherein a term score is indicative of a level of relevance between a corresponding content item in the posting list and the term; generating a candidate list of content items based on the plurality of posting lists, wherein the step of generating the candidate list comprises: selecting, from each of the posting lists, a first content item having a rank with a same first value in each of the posting lists, retrieving, for each of the first content items, from the respective posting lists, an associated term score of the first content item, and creating the candidate list with the first content items that are ranked based on their corresponding term scores; updating the candidate list by: selecting, from each of the posting lists, a next content item having a next rank with a value lower than a previous rank value, for each of the next content items, if the next content item is not in the candidate list, inserting a new entry in the candidate list for the next content item with its content item identifier and its corresponding term score, and if the next content item is in the candidate list, summing all available term scores associated with the next content item in all of the posting lists, wherein the rank values associated with the all available term scores in all of the posting lists are equal or higher than the next rank value, and re-ranking the candidate list based on the summed term scores of the next content item, and repeating the step of selecting, inserting, summing, and re-ranking until the candidate list has been updated based on all of the one or more content items in each of the posting lists; determining, based on the candidate list, the candidate content items; and providing, based on the candidate content items, a response to the query. 2. The method of claim 1 , wherein the step of generating the candidate list further comprises: determining, for each posting list, a first posting list entry, wherein each posting list is analyzed using a separate one of the plurality of processors; selecting a content item identifier associated with a first content item from the first posting list entry of each posting list; and extracting a corresponding term score for the first posting list entry of each posting list. 3. The method of claim 2 , further comprising: determining whether a data object associated with the first content item identifier exists in a data structure; and based on the determining, performing: generating the data object in response to determining an absence of any data objects in the data structure that are associated with the content item identifier associated with the first content item, wherein the data object stores the content item identifier and the corresponding term score for the first posting list entry of a corresponding posting list; or adding the corresponding term score of the content item identifier for the first posting list entry to the data structure in response to determining that the data object exists in the data structure. 4. The method of claim 1 , further comprising: determining, using a first processor of the plurality of processors, a first term score associated with a first content item of the one or more content items included in a first posting list of the plurality of posting lists, wherein the first content item is a first entry in the first posting list, and the first term score is greater than or equal to each additional term score included in the first posting list; determining that a data object representative of the first content item in a data structure exists; and adding the first term score to the data object, wherein the data object comprises the first term score associated with the first posting list and at least a second term score associated with a second content item included in a second posting list, the second posting list being analyzed using a second processor of the plurality of processors. 5. The method of claim 4 , further comprising: determining an upper bound term score for each posting list; and storing each upper bound term score in the data structure. 6. The method of claim 1 , further comprising: generating a content item map that stores a first listing of content item identifiers associated with each content item of the one or more content items analyzed from each posting list; determining that the cleaning condition has been satisfied; generating a temporary content item map that stores a second listing of content item identifiers comprising at least a portion of the first listing of content item identifiers, wherein the second listing of content item identifiers comprises content item identifiers having a corresponding lower bound term score that exceeds a threshold term score; and performing a compare and swap (“CAS”) operation to replace the first listing of content item identifiers with the second listing of content item identifiers. 7. A system comprising a plurality of processors, memory, and a communications platform in communication with a network for retrieving documents for a search, comprising: a query decomposition unit configured to receive a query comprising a plurality of terms; a plurality of query term based searchers each being configured to: obtain, for each of the plurality of terms, a posting list of one or more content items ranked based on term scores associated with a corresponding one or more content items, wherein a term score is indicative of a level of relevance between a corresponding content item in the posting list and the term, and generating a candidate list of content items based on the plurality of posting lists, wherein the step of generating the candidate list comprises: selecting, from each of the posting lists, a first content item having a rank with a same first value in each of the posting lists, retrieving, for each of the first content items, from the respective posting lists, an associated term score of the first content item, and creating the candidate list with the first content items that are ranked based on their corresponding term scores; updating the candidate list by: selecting, from each of the posting lists, a next content item having a next rank with a value lower than a previous rank value, for each of the next content items, if the next content item is not in the candidate list, inserting a new entry in the candidate list for the next content item with its content item identifier and its corresponding term score, and if the next content item is in the candidate list, summing all available term scores associated with the next content item in all of the posting lists, wherein the rank values associated with the all available term scores in all of the posting lists are equal or higher than the next rank value, and re-ranking the candidate list based on the summed term scores of the next content item, and repeating the step of selecting, inserting, summing, and re-ranking until the candidate list has been updated based on all of the one or more content items in each of the posting lists; determining, based on the candidate list, the candidate content items; and a query search result aggregator configured to provide, based on the candidate content items, a response to the query. 8. The

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11734285B2 cover?
Methods, systems, and programming for retrieving content items for a search are described herein. In a non-limiting embodiment, a query including a plurality of terms may be received. For each of the plurality of terms, a posting list of one or more content items may be obtained. The posting list may include a ranked list of term scores corresponding to the one or more content items, each of th…
Who is the assignee on this patent?
Verizon Patent & Licensing Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24578. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).