False negative prediction for training a machine-learning model

US12450277B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12450277-B2
Application numberUS-202418932301-A
CountryUS
Kind codeB2
Filing dateOct 30, 2024
Priority dateNov 2, 2023
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An online system updates the labels on negative examples to account for the possibility that the example is a false negative. The system generates a set of initial training examples that each include a query input by the user and item data for an item presented as a result to the user's query. Each training example also includes an initial label, which represents whether the user interacted with the item presented as a search result. The online system updates the initial label for a negative training example by identifying a set of bridge queries and computing a similarity score between the query for the training example and the bridge queries. The online system computes an updated label for the negative example based on the similarity scores and updates the training example with the updated label.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium storing a set of parameters for a machine-learning model, wherein the parameters are produced by a process comprising: initializing the set of parameters for the machine-learning model; accessing search log data captured by an online system, wherein the search log data describes a plurality of queries placed by users of the online system and a plurality of items; generating a set of initial training examples based on the search log data, wherein each initial training example comprises a query of the plurality of queries, an item of the plurality of items, and an initial label, wherein the initial label represents whether the item was presented as a search result to a user and was selected by the user; identifying a set of initial negative examples by identifying a subset of the set of initial training examples with initial labels below a threshold value; updating the set of initial negative examples to generate a set of updated negative examples, wherein the set of updated negative examples is generated by, for each initial negative example: identifying a plurality of bridge queries for the query of the initial negative example, wherein the plurality of bridge queries is a subset of the plurality of queries for which the item of the initial negative example was presented as a search result to a user and was selected by the user; computing a similarity score between the query of the initial negative example and each of the plurality of bridge queries; computing an updated label for the initial negative example based on the computed similarity scores; and generating an updated negative example comprising the computed updated label and the query and item of the initial negative example; generating a final set of training examples comprising the set of updated negative examples and a subset of the set of initial training examples with initial labels above the threshold value; updating the set of parameters by processing each training example in the final set of training examples, wherein updating the set of parameters results in an updated set of parameters for the machine-learning model, and wherein processing each training example in the set of training examples comprises: applying the machine-learning model to the item data and the query of the training example to generate an item prediction score, wherein the item prediction score represents a predicted likelihood that a user would select the item when the item is presented as a search result for the query; computing a loss score by comparing the item prediction score to a label of the training example; and updating the set of parameters for the machine-learning model through a backpropagation process using the computed loss score; and storing the updated set of parameters on the computer-readable medium. 2. The computer-readable medium of claim 1 , wherein the machine-learning model is a cross encoder model. 3. The computer-readable medium of claim 1 , wherein each initial training example of the set of initial training examples comprises user data describing a user corresponding to the query of the initial training example. 4. The computer-readable medium of claim 1 , wherein each initial training example of the set of initial training examples comprises context data describing a context of the query of the initial training example. 5. The computer-readable medium of claim 1 , wherein computing a similarity score between the query of the initial negative example and a bridge query comprises: applying a query embedding model to the query and the bridge query to generate embeddings for the query and bridge query; and computing a distance between the embedding for the query and the embedding for the bridge query. 6. The computer-readable medium of claim 5 , wherein the embedding model is part of a bi-encoder model that is trained to generate query embeddings and item embeddings for use in selecting items for search results. 7. The computer-readable medium of claim 1 , wherein computing an updated label for the initial negative examples comprises: computing an average of the computed similarity scores. 8. The computer-readable medium of claim 1 , further comprising: filtering the set of updated negative examples based on the updated labels. 9. A method, performed by a computing system comprising a processor and a non-transitory computer-readable medium, comprising: initializing a set of parameters for a machine-learning model; accessing search log data captured by an online system, wherein the search log data describes a plurality of queries placed by users of the online system and a plurality of items; generating a set of initial training examples based on the search log data, wherein each initial training example comprises a query of the plurality of queries, an item of the plurality of items, and an initial label, wherein the initial label represents whether the item was presented as a search result to a user and was selected by the user; identifying a set of initial negative examples by identifying a subset of the set of initial training examples with initial labels below a threshold value; updating the set of initial negative examples to generate a set of updated negative examples, wherein the set of updated negative examples is generated by, for each initial negative example: identifying a plurality of bridge queries for the query of the initial negative example, wherein the plurality of bridge queries is a subset of the plurality of queries for which the item of the initial negative example was presented as a search result to a user and was selected by the user; computing a similarity score between the query of the initial negative example and each of the plurality of bridge queries; computing an updated label for the initial negative example based on the computed similarity scores; and generating an updated negative example comprising the computed updated label and the query and item of the initial negative example; generating a final set of training examples comprising the set of updated negative examples and a subset of the set of initial training examples with initial labels above the threshold value; updating the set of parameters by processing each training example in the final set of training examples, wherein updating the set of parameters results in an updated set of parameters for the machine-learning model, and wherein processing each training example in the set of training examples comprises: applying the machine-learning model to the item data and the query of the training example to generate an item prediction score, wherein the item prediction score represents a predicted likelihood that a user would select the item when the item is presented as a search result for the query; computing a loss score by comparing the item prediction score to a label of the training example; and updating the set of parameters for the machine-learning model through a backpropagation process using the computed loss score; and storing the updated set of parameters on the computer-readable medium. 10. The method of claim 9 , wherein the machine-learning model is a cross encoder model. 11. The method of claim 9 , wherein each initial training example of the set of initial training examples comprises user data describing a user corresponding to the query of the initial training example. 12. The method of claim 9 , wherein each initial training example of the set of initial training examples comprises context data describing a context of the query of the initial training example. 13. The method of claim 9

Assignees

Inventors

Classifications

  • Query formulation · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • using ranking · CPC title

  • Clustering; Classification · CPC title

  • using metadata automatically derived from the content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450277B2 cover?
An online system updates the labels on negative examples to account for the possibility that the example is a false negative. The system generates a set of initial training examples that each include a query input by the user and item data for an item presented as a result to the user's query. Each training example also includes an initial label, which represents whether the user interacted wit…
Who is the assignee on this patent?
Maplebear Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).