Computerized systems and methods for using artificial intelligence to generate product recommendations
US-11386478-B1 · Jul 12, 2022 · US
US12210591B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12210591-B2 |
| Application number | US-202117407158-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 19, 2021 |
| Priority date | Aug 19, 2021 |
| Publication date | Jan 28, 2025 |
| Grant date | Jan 28, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An online concierge system receives unstructured data describing items offered for purchase by various warehouses. To generate attributes for products from the unstructured data, the online concierge system extracts candidate values for attributes from the unstructured data through natural language processing. One or more users associate a subset candidate values with corresponding attributes, and the online concierge system clusters the remaining candidate values with the candidate values of the subset associated with attributes. One or more users provide input on the accuracy of the generated clusters. The candidate values are applied as labels to items by the online concierge system, which uses the labeled items as training data for an attribute extraction model to predict values for one or more attributes from unstructured data about an item.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining unstructured data describing items for display by an online concierge system, the unstructured data including a name of each item; identifying a set of items each having a common characteristic; extracting candidate values for attributes as segments from names of each item of the set, each candidate value associated with a frequency with which the segment occurs in the set of items; identifying a subset of candidate values based on frequency of occurrence in the set of items; generating a seed set of candidate values for one or more attributes from inputs received from one or more users associating candidate values of the subset with one or more attributes; generating clusters of candidate values from distances between candidate values not included in the subset and candidate values of the subset associated with one or more attributes, each cluster of candidate value corresponding to an attribute and including candidate values that are potential values for the attribute; receiving input from one or more users manually reviewing the generated clusters for accuracy; applying one or more labels to each item of the set, a label applied to an item of the set indicating a candidate value corresponding to an attribute matching a segment extracted from a name of the item of the set; generating training data including a plurality of examples, each example including an identifier of the item of the set and the one or more labels applied to the item of the set; and training an attribute extraction model to predict values for one or more attributes of an item from unstructured data describing the item by applying the attribute extraction model to the plurality of examples of the training data. 2. The method of claim 1 , wherein identifying the subset of candidate values based on frequency of occurrence in the set of items comprises: selecting candidate values having at least a threshold frequency of occurrence in the set of items. 3. The method of claim 1 , wherein identifying the subset of candidate values based on frequency of occurrence in the set of items comprises: ranking candidate values based on frequency of occurrence in the set of items; and selecting candidate values having at least a threshold position in the ranking. 4. The method of claim 1 , wherein generating clusters of candidate values from distances between candidate values not included in the subset and candidate values of the subset associated with one or more attributes comprises: identifying seed clusters as candidate values of the subset associated each associated with a common attribute; identifying a candidate value not included in the subset; determining a distance between the candidate value not included in the subset and each seed cluster; and generating a cluster including the candidate value not included in the subset and a seed cluster having less than a threshold distance to the candidate value not included in the subset. 5. The method of claim 4 , wherein the distance between the candidate value not included in the subset a seed cluster is based on semantic distances between the not included in the subset and candidate values included in the seed cluster. 6. The method of claim 4 , wherein the distance between the candidate value not included in the subset a seed cluster is based on syntactic distances between the not included in the subset and candidate values included in the seed cluster. 7. The method of claim 1 , wherein generating clusters of candidate values from distances between candidate values not included in the subset and candidate values of the subset associated with one or more attributes comprises: identifying seed clusters as candidate values of the subset associated each associated with a common attribute; identifying a candidate value not included in the subset; determining a distance between the candidate value not included in the subset and each seed cluster; and generating a cluster including the candidate value not included in the subset and a seed cluster having a minimum distance to the candidate value not included in the subset. 8. The method of claim 7 , wherein the distance between the candidate value not included in the subset a seed cluster is based on semantic distances between the not included in the subset and candidate values included in the seed cluster. 9. The method of claim 7 , wherein the distance between the candidate value not included in the subset a seed cluster is based on syntactic distances between the not included in the subset and candidate values included in the seed cluster. 10. The method of claim 1 , further comprising: applying the trained attribute extraction model to unstructured data describing additional items; displaying one or more of the additional items to one or more users using predicted values for one or more of the additional items; receiving feedback from a user to whom the one or more additional items were displayed; and updating the training data based on the received feedback. 11. The method of claim 1 , wherein receiving input from one or more users manually reviewing the generated clusters for accuracy comprises: receiving an input from a user identifying a generated cluster corresponding to an attribute and an indication whether candidate values included in the generated cluster are accurate. 12. The method of claim 11 , wherein the input identifies one or more candidate values to remove from the generated cluster. 13. A non-transitory computer readable medium having instructions encoded thereon that, when executed by a processor, cause the processor to: obtain unstructured data describing items for display by an online concierge system, the unstructured data including a name of each item; identify a set of items each having a common characteristic; extract candidate values for attributes as segments from names of each item of the set, each candidate value associated with a frequency with which the segment occurs in the set of items; identify a subset of candidate values based on frequency of occurrence in the set of items; generate a seed set of candidate values for one or more attributes from inputs received from one or more users associating candidate values of the subset with one or more attributes; generate clusters of candidate values from distances between candidate values not included in the subset and candidate values of the subset associated with one or more attributes, each cluster of candidate value corresponding to an attribute and including candidate values that are potential values for the attribute; receive input from one or more users manually reviewing the generated clusters for accuracy; apply one or more labels to each item of the set, a label applied to an item of the set indicating a candidate value corresponding to an attribute matching a segment extracted from a name of the item of the set; generate training data including a plurality of examples, each example including an identifier of the item of the set and the one or more labels applied to the item of the set; and train an attribute extraction model to predict values for one or more attributes of an item from unstructured data describing the item by applying the attribute extraction model to the plurality of examples of the training data. 14. The non-transitory computer readable medium of claim 13 , wherein identify the subset of candidate values based on frequency of occurrence in the set of items comprises: select candidate values having at least a threshold frequency of occurrence in the set of items.
based on feedback of a supervisor · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Matching criteria, e.g. proximity measures · CPC title
utilising user interfaces specially adapted for shopping · CPC title
Clustering techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.