Accounting for positional bias in a document retrieval system using machine learning

US10565265B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10565265-B2
Application numberUS-201615292033-A
CountryUS
Kind codeB2
Filing dateOct 12, 2016
Priority dateOct 12, 2016
Publication dateFeb 18, 2020
Grant dateFeb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A document retrieval system tracks user selections of documents from query search results and uses the selections as proxies for manual user labeling of document relevance. The system trains a model representing the significance of different document features when calculating true document relevance for users. To factor in positional biases inherent in user selections in search results, the system learns positional bias values for different search result positions, such that the positional bias values are accounted for when computing document feature features that are used to compute true document relevance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: accessing query data comprising, for each respective query of a plurality of queries: the respective query, respective feature values, of each respective document of a plurality of documents presented to a first user in response to the respective query, for each respective document feature of a plurality of document features, wherein the feature values include a number of times a query component of the query appears in content of the respective document, a number of times a query component of the query appears in a title of the respective document, and a number of times that the respective document has been viewed by any user, a position of a selected one of the documents selected by the first user within query search results, and a user interface type corresponding to a user interface used by the first user to select the selected one of the documents; performing machine learning to compute: document feature weights to apply to feature values of each of the plurality of documents to determine relevances of each of the plurality of documents to users, and positional bias values that when combined with the determined relevances represent probabilities that users will click on each of the plurality of documents when the plurality of documents are presented at given positions within search results for a given query, the positional bias values being adjusted based on the user interface type; receiving a query from a second user from a user interface having a second user interface type; identifying matching documents matching the query; determining feature values of the matching documents; determining relevance scores of the matching documents by applying the computed document feature weights to the determined feature values of the matching documents and adjusting the relevance scores based on the second user interface type; and presenting the matching documents to the second user in an order based on the determined relevance scores. 2. A computer-implemented method comprising: accessing query data comprising, for each respective query of a plurality of queries: the respective query, respective feature values, of each respective document of a plurality of documents presented to a first user in response to the query, for each respective document feature of a plurality of document features, a position of a selected one of the documents selected by the user within query search results, and a user interface type corresponding to a user interface used by the user to select the selected one of the documents, wherein the feature values include a number of times a query component of the query appears in content of the respective document, a number of times a query component of the query appears in a title of the respective document, and a number of times that the respective document has been viewed by any user; and performing machine learning to learn feature weights to apply to values of document features of each of the plurality of documents to determine relevances of each of the plurality of documents to users, and to learn positional bias values that when combined with the determined relevances represent probabilities that users will click on each of the plurality of documents when they are presented at given positions within search results for a given query, the positional bias values being adjusted based on the user interface type. 3. The computer-implemented method of claim 2 , further comprising: receiving a query from a second user; identifying matching documents matching the query; determining document feature values of the matching documents; determining relevance scores of the matching documents by applying the learned feature weights to the determined document feature values of the matching documents; and presenting the matching documents to the second user in an order based on the determined relevance scores. 4. The computer-implemented method of claim 2 , wherein the query data was obtained from search results presented in a first user interface, and wherein the computer-implemented method further comprises modifying the user interface based on the positional bias values. 5. The computer-implemented method of claim 2 , wherein the machine learning is performed using neural networks. 6. A non-transitory computer-readable storage medium storing instructions executable by a computer processor and comprising: instructions for accessing query data comprising, for each respective query of a plurality of queries: the respective query, respective feature values, of each respective document of a plurality of documents presented to a first user in response to the query, for each respective feature of a plurality of document features, a position of a selected one of the documents selected by the first user within query search results, and a user interface type corresponding to a user interface used by the first user to select the selected one of the documents, wherein the feature values include a number of times a query component of the query appears in content of the respective document, a number of times a query component of the query appears in a title of the respective document, and a number of times that the respective document has been viewed by any user; and instructions for computing feature weights to apply to values of document features of each of the plurality of documents to determine relevances of each of the plurality of documents to users, and for computing positional bias values that when combined with the determined relevances represent probabilities that users will click on each of the plurality of documents when they are presented at given positions within search results for a given query, the positional bias values being adjusted based on the user interface type. 7. The non-transitory computer-readable storage medium of claim 6 , further comprising: receiving a query from a second user; identifying matching documents matching the query; determining document feature values of the matching documents; determining relevance scores of the matching documents by applying the learned feature weights to the determined document feature values of the matching documents; and presenting the matching documents to the second user in an order based on the determined relevance scores. 8. The non-transitory computer-readable storage medium of claim 6 , wherein the query data was obtained from search results presented in a first user interface, and wherein the instructions further comprise instructions for modifying the user interface based on the positional bias values. 9. The non-transitory computer-readable storage medium of claim 6 , wherein the machine learning is performed using neural networks.

Assignees

Inventors

Classifications

  • G06F16/93Primary

    Document management systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10565265B2 cover?
A document retrieval system tracks user selections of documents from query search results and uses the selections as proxies for manual user labeling of document relevance. The system trains a model representing the significance of different document features when calculating true document relevance for users. To factor in positional biases inherent in user selections in search results, the sys…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/93. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).