What technology area does this patent fall under?

Primary CPC classification G06F16/24534. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training a machine learned model to determine relevance of items to a query using different sets of training data from a common domain

US12222937B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12222937-B2
Application number	US-202217668358-A
Country	US
Kind code	B2
Filing date	Feb 9, 2022
Priority date	Feb 9, 2022
Publication date	Feb 11, 2025
Grant date	Feb 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An online concierge system maintains various items and an item embedding for each item. When the online concierge system receives a query for retrieving one or more items, the online concierge system generates an embedding for the query. The online concierge system trains a machine-learned model to determine a measure of relevance of an embedding for a query to item embeddings by generating training data of examples including queries and items with which users performed a specific interaction. The online concierge system generates a subset of the training data including examples satisfying one or more criteria and further trains the machine-learned model by application to the examples of the subset of the training data and stores parameters resulting from the further training as parameters of the machine-learned model.

First claim

Opening claim text (preview).

What is claimed is: 1. A machine-learned model stored on a non-transitory computer readable storage medium, wherein the machine-learned model is manufactured by a process comprising: generating training data comprising a plurality of examples, each example comprising a query received by an online concierge system and an item with which a user of the online concierge system performed a specific interaction, wherein a label applied to each example of the training data indicates whether the specific interaction was performed with the item after the online concierge system received the query; generating a noisy subset of training data and a high-quality subset of training data based on a metric that defines a quality of training data, wherein the high-quality subset has a higher metric value than the noisy subset; initializing the machine-learned model comprising a network of a plurality of layers, the machine-learned model configured to receive a query and an item and to generate a predicted measure of relevance of the item to the query; for each of a plurality of the examples of the noisy subset of the training data: applying, by one or more processors, the machine-learned model to the query of the example of the noisy subset of the training data and to the item of the example of the noisy subset of the training data; backpropagating, in one or more iterations and by the one or more processors, one or more error terms obtained from one or more loss functions to update a set of parameters of the network, the backpropagating performed through the network and one or more of the error terms based on a difference between the label applied to the example of the noisy subset of the training data and a predicted measure of relevance of the item of the example of the noisy subset of the training data and to the query of the example of the training data; stopping, by the one or more processors, the backpropagation after the one or more loss functions satisfy one or more criteria; storing, in the computer readable storage medium, the set of parameters of the network that are updated in the one or more iterations; initializing the network to the stored set of parameters; for each of the plurality of the examples of the high-quality subset of the training data: applying, by the one or more processors, the machine-learned model to the query of the example of the high-quality subset of the training data and to the item of the example of the high-quality subset of the training data; backpropagating, by the one or more processors, one or more error terms obtained from one or more loss functions to generate a modified set of parameters of the network, the backpropagating performed through the network and one or more of the error terms based on a difference between a label applied to the example of the high-quality subset of the training data and a predicted measure of relevance of the item of the example of the high-quality subset of the training data and to the query of the example of the subset of the training data; stopping, by the one or more processors, the backpropagation after the one or more loss functions satisfy one or more criteria; and storing, in the computer readable storage medium, the modified set of parameters of the network trained from the subset of the training data as parameters of the machine-learned model. 2. The machine-learned model of claim 1 , wherein generating the high-quality subset of training data comprises: selecting examples of the training data including items with which the specific interaction was performed with at least a threshold frequency. 3. The machine-learned model of claim 2 , wherein generating the high-quality subset of training data further comprises: determining an example of the training data includes an item with which the specific frequency was performed with at least an additional threshold frequency; and including a specific number of replicas of the example determined to include the item with which the specific frequency was performed with at least the additional threshold frequency in the subset of the training data in response to the determining. 4. The machine-learned model of claim 1 , wherein generating the high-quality subset of training data comprises: ranking examples of the training data based on frequencies with which the specific interaction was performed with items included in the examples of the training data; selecting examples of the training data having at least a threshold position in the ranking. 5. The machine-learned model of claim 4 , wherein generating the high-quality subset of training data further comprises: determining an example of the training data includes an item with which the specific frequency was performed with at least a threshold frequency; and including a specific number of replicas of the example determined to include the item with which the specific frequency was performed with at least the threshold frequency in the subset of the training data in response to the determining. 6. The machine-learned model of claim 1 , wherein the specific interaction comprises including the item in an order received by the online concierge system. 7. The machine-learned model of claim 1 , wherein backpropagating one or more error terms obtained from one or more loss functions to modify the set of parameters of the network comprises: generating the one or more error terms from application of the machine-learned model to the example of the high-quality subset of the training data using an alternative loss function than a loss function generating the error term from application of the machine-learned model to the example of the training data. 8. The machine-learned model of claim 7 , wherein the alternative loss function applies a higher weight to an error term from application of the machine-learned model to the example of the high-quality subset of the training data than the loss function generating the error term from application of the machine-learned model to the noisy subset of the training data. 9. The machine-learned model of claim 1 , wherein applying the machine-learned model to the query of the example of the noisy subset of the training data and to the item of the example of the noisy subset of the training data comprises: applying the machine-learned model with a particular architecture to the example of the noisy subset of the training data and to the item of the example of the noisy subset of the training data. 10. The machine-learned model of claim 9 , wherein applying the machine-learned model to the query of the example of the high-quality subset of the training data and to the item of the example of the high-quality subset of the training data comprises: applying the machine-learned model with a different architecture than the particular architecture to the example of the high-quality subset of the training data and to the item of the high-quality subset of the example of the training data. 11. A method comprising: generating training data comprising a plurality of examples, each example comprising a query received by an online concierge system and an item with which a user of the online concierge system performed a specific interaction, wherein a label applied to each example of the training data indicates whether the specific interaction was performed with the item after the online concierge system received the query; generating a noisy subset of training data and a high-quality subset of training data based on a metric that defines a quality of training data, wherein the high-quality subset has a higher metric value than the noisy subset; initializing a machine-learned model comprising a networ

Assignees

Maplebear Inc

Inventors

Classifications

G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06F16/283
Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title
G06F16/24578
using ranking · CPC title
G06N20/00
Machine learning · CPC title
G06F16/2448
for particular applications; for extensibility, e.g. user defined types · CPC title

Patent family

Related publications grouped by family.

View patent family 87564886

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12222937B2 cover?: An online concierge system maintains various items and an item embedding for each item. When the online concierge system receives a query for retrieving one or more items, the online concierge system generates an embedding for the query. The online concierge system trains a machine-learned model to determine a measure of relevance of an embedding for a query to item embeddings by generating tra…
Who is the assignee on this patent?: Maplebear Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/24534. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Machine-learning based generation of text style variations for digital content items

System and method for improving machine learning models by detecting and removing inaccurate training data

Optimizing task assignments in a delivery system

Convolutional neural networks using resistive processing unit array

Frequently asked questions