Search Relevance Model Using Self-Adversarial Negative Sampling

US2023252549A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023252549-A1
Application numberUS-202318107854-A
CountryUS
Kind codeA1
Filing dateFeb 9, 2023
Priority dateFeb 9, 2022
Publication dateAug 10, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To train an embedding-based model to determine relevance between items and queries, an online system generates training data from previously received queries and interactions with results for the queries. The training data includes positive training examples including a query and an item with which a user performed a specific interaction after providing the query. To generate negative training examples for the query to include in the training data, the online system determines measures of similarity between items with which the specific interaction was not performed and the query. The online system may weight a loss function for the embedding-based model by the measure of similarity for a negative example, increasing the effect of a negative example including a query and an item with a larger measure of similarity. In other embodiments, the online system selects negative training examples based on the measures of similarities between items and queries in pairs.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for generating a query-item relevance model, the method comprising: obtaining a training dataset that comprises a plurality of training examples, wherein each of the plurality of training examples is either a positive training example or a negative training example, and wherein each of the plurality of training examples comprises: a query comprising one or more terms, and item data describing one or more attributes of an item; adding a label to each of the plurality of training examples, wherein: the label for a positive training example indicates that a specific interaction was performed with the item after the query was received, and the label for a negative training example indicates that a specific interaction was not performed with the item after the query was received, the label further weighted by a measure of similarity between the item of the training example and the query of the training example; and accessing a machine learning model comprising a network of a plurality of layers, the machine learning model configured to receive an input query and an input item and to generate a measure of relevance of the input item to the input query; for each training example of the plurality of training examples of the training dataset: applying the machine learning model to the query of the training example and to the item of the training example, wherein the machine learning model outputs a measure of relevance based thereon, generating an error term based on a difference between the measure of relevance output from the machine learning model and the label of the training example, and backpropagating the error term to update a set of parameters of the machine learning model; and storing the set of parameters on a non-transitory computer readable storage medium as trained parameters of the query-item relevance model. 2 . The method of claim 1 , wherein the measure of similarity between the item embedding of the item of the training example and the query embedding of the query of the training example comprises the measure of relevance between the item of the training example and the query of the training example using the stored set of parameters for the machine learning model. 3 . The method of claim 1 , wherein the measure of similarity between the item embedding of the item of the training example and the query embedding of the query of the training example comprises a cosine similarity between the item of the training example and the query of the training example. 4 . The method of claim 1 , wherein labeling each of the plurality of training examples comprises: determining the measure of similarity between the item of the training example and the query of the training example, for each negative training example, by comparing an item embedding of the item of the training example and a query embedding of the query of the training example 5 . The method of claim 1 , wherein obtaining the training dataset further comprises: selecting a subset of the generated one or more negative training examples for the query. 6 . The method of claim 5 , wherein selecting the subset of the generated one or more negative training examples for the query comprises: selecting negative training examples from the generated one or more negative training examples based on a measure of similarity between the query and an item included in a negative training example. 7 . The method of claim 6 , wherein the measure of similarity between the query and the item included in the negative training example comprises the measure of relevance between the item of the negative training example and the query using the stored set of parameters for the machine learning model. 8 . The method of claim 6 , wherein the measure of similarity between the query and the item included in the negative training example comprises a cosine similarity between the item of the negative training example and the query. 9 . The method of claim 5 , wherein selecting the subset of the generated one or more negative training examples for the query comprises: selecting negative training examples based on a probability distribution where a probability of selecting a negative training example is based on a measure of similarity between an item included in the negative training example and the query. 10 . The method of claim 1 , wherein the network of the plurality of layers comprises a query encoder configured to generate a query embedding for the query of the training example, an item encoder configured to generate an item embedding for the item of the training example, and a fusion layer configured to generate the measure of relevance from the query embedding and the item embedding. 11 . A product comprising a query-item relevance model stored on a non-transitory computer readable storage medium, wherein query-item relevance model is manufactured by a process comprising: obtaining a training dataset that comprises a plurality of training examples, wherein each of the plurality of training examples is either a positive training example or a negative training example, and wherein each of the plurality of training examples comprises: a query comprising one or more terms, and item data describing one or more attributes of an item; adding a label to each of the plurality of training examples, wherein: the label for a positive training example indicates that a specific interaction was performed with the item after the query was received, and the label for a negative training example indicates that a specific interaction was not performed with the item after the query was received, the label further weighted by a measure of similarity between the item of the training example and the query of the training example; and accessing a machine learning model comprising a network of a plurality of layers, the machine learning model configured to receive an input query and an input item and to generate a measure of relevance of the input item to the input query; for each training example of the plurality of training examples of the training dataset: applying the machine learning model to the query of the training example and to the item of the training example, wherein the machine learning model outputs a measure of relevance based thereon, generating an error term based on a difference between the measure of relevance output from the machine learning model and the label of the training example, and backpropagating the error term to update a set of parameters of the machine learning model; and storing the set of parameters on a non-transitory computer readable storage medium as trained parameters of the query-item relevance model. 12 . The product of claim 11 , wherein the measure of similarity between the item embedding of the item of the training example and the query embedding of the query of the training example comprises the measure of relevance between the item of the training example and the query of the training example using the stored set of parameters for the machine learning model. 13 . The product of claim 11 , wherein the measure of similarity between the item embedding of the item of the training example and the query embedding of the query of the training example comprises a cosine similarity between the item of the training example and the query of the training example. 14 . The product of claim 11 , wherein labeling each of the plurality of training examples comprises: determining the measure of similarity between the item of the training example and the query of the training example, for each negative

Assignees

Inventors

Classifications

  • Recommending goods or services · CPC title

  • Market modelling; Market analysis; Collecting market data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023252549A1 cover?
To train an embedding-based model to determine relevance between items and queries, an online system generates training data from previously received queries and interactions with results for the queries. The training data includes positive training examples including a query and an item with which a user performed a specific interaction after providing the query. To generate negative training …
Who is the assignee on this patent?
Maplebear Inc Dba Instacart
What technology area does this patent fall under?
Primary CPC classification G06Q30/0631. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).