Regularized iterative collaborative feature learning from web and user behavior data

US11042798B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11042798-B2
Application numberUS-201615082877-A
CountryUS
Kind codeB2
Filing dateMar 28, 2016
Priority dateFeb 4, 2016
Publication dateJun 22, 2021
Grant dateJun 22, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Certain embodiments involve learning features of content items (e.g., images) based on web data and user behavior data. For example, a system determines latent factors from the content items based on data including a user's text query or keyword query for a content item and the user's interaction with the content items based on the query (e.g., a user's click on a content item resulting from a search using the text query). The system uses the latent factors to learn features of the content items. The system uses a previously learned feature of the content items for iterating the process of learning features of the content items to learn additional features of the content items, which improves the accuracy with which the system is used to learn other features of the content items.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for iterative collaborative feature learning usable for accurately matching image content in queries to corresponding image results, the method comprising: obtaining, by a processor, a data set that includes data indicating whether a user providing a text query clicked on a content item resulting from a search of the text query; populating, by the processor, at least one missing data entry in the data set; determining, by the processor, a first latent factor of the content item from the data set, wherein the first latent factor includes information about a feature of the content item; learning, by the processor, a first feature of the content item based on the first latent factor; modifying the data set by populating an additional missing data entry in the data set with the first feature; iteratively learning, by the processor, a second feature of the content item, wherein each iteration comprises: decomposing the modified data set to identify a second latent factor of the content item based on the modified data set, wherein the decomposing includes factorizing a data matrix that includes the modified data set, the data matrix being decomposed into a content item latent factors matrix and a text query latent factors matrix, wherein the second latent factor of the content item is learned based on a difference between additional latent factors of multiple text queries indicated by the text query latent factors matrix, grouping the first latent factor and the second latent factor into a cluster, learning the second latent factor of the content item based on the cluster, and learning the second feature of the content item based on the second latent factor; and training a neural network usable for matching content items in queries by learning to classify a plurality of content items based at least partially on the first or the second feature of the content item. 2. The method of claim 1 , further comprising: receiving, by the processor, data associated with a query content item; determining, by the processor, a latent factor of the query content item; learning, by the processor, a feature of the query content item; and outputting, by the processor, a recommendation indicating a recommended content item, wherein the recommended content item is similar to the query content item. 3. The method of claim 1 , further comprising: determining, by the processor, whether the user providing the text query clicked on the content item resulting from the search of the text query; and outputting, by the processor, a recommendation to the user indicating a recommended content item, wherein the recommended content item is similar to the content item clicked on by the user. 4. The method of claim 1 , wherein populating the at least one missing data entry in the data set includes: determining, by the processor, an amount of missing data in the data set; generating, by the processor, derived data for the data set based on the amount of missing data being below a threshold amount of missing data; and populating, by the processor, the at least one missing data entry in the data set by replacing the at least one missing data entry with the derived data. 5. The method of claim 1 , wherein determining the second latent factor of the content item includes: decomposing, by the processor, the data matrix into a plurality of data matrices that include data associated with the first latent factor or the second latent factor, wherein the decomposed data matrix includes the first feature. 6. The method of claim 1 , further comprising: grouping, by the processor, the first latent factor and the second latent factor into a plurality of clusters and wherein learning the first feature or the second feature of the content item includes learning the first feature or the second feature based on the plurality of clusters. 7. The method of claim 1 , wherein obtaining the data set includes: determining by the processor, that outlier data from the data set includes data in the data set that is above or below a threshold associated with an accuracy with which the neural network classifies the plurality of content items; and removing, by the processor, the outlier data from the data set based on the data in the data set being above or below the threshold. 8. A system comprising: a processing device; and a non-transitory computer-readable medium communicatively coupled to the processing device, wherein the processing device is configured to perform operations comprising: obtaining a data set that includes data indicating whether a user providing a text query clicked on a content item resulting from a search of the text query; populating at least one missing data entry in the data set; determining a first latent factor of the content item from the data set, wherein the first latent factor includes information about a feature of the content item; learning a first feature of the content item based on the first latent factor; modifying the data set by populating an additional missing data entry in the data set with the first feature; iteratively learning a second feature of the content item, wherein each iteration comprises: decomposing the modified data set to identify a second latent factor of the content item based on the modified data set, wherein the decomposing includes factorizing a data matrix that includes the modified data set, the data matrix being decomposed into a content item latent factors matrix and a text query latent factors matrix, wherein the second latent factor of the content item is learned based on a difference between additional latent factors of multiple text queries indicated by the text query latent factors matrix, grouping the first latent factor and the second latent factor into a cluster, learning the second latent factor of the content item based on the cluster, and learning the second features of the content item based on the second latent factor; and training a neural network usable for matching content items in queries by learning to classify a plurality of content items based at least partially on the first or the second feature of the content item. 9. The system of claim 8 , wherein the processing device is further configured to train the neural network to: learn features of a plurality of content items and classify the plurality of content items based on the learned features. 10. The system of claim 8 , wherein the processing device is further configured to: receive data associated with a query content item; determine a latent factor of the query content item; learn a feature of the query content item; and output a recommendation indicating a recommended content item, wherein the recommended content item is similar to the query content item. 11. The system of claim 8 , wherein the processing device is further configured to: determine an amount of missing data in the data set; generate derived data for the data set based on the amount of missing data being below a threshold amount of missing data; and populate the at least one missing data entry in the data set by replacing the at least one missing data entry with the derived data. 12. The system of claim 8 , wherein the processing device is further configured to: train the neural network to learn features of a plurality of content items and classify the plurality of content items based on the learned features; and remove outlier data from the data set to improve an accuracy with which the trained neural network classifies the plurality of content items, wherein outlier data includes data in the data set that is above or below a thresho

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11042798B2 cover?
Certain embodiments involve learning features of content items (e.g., images) based on web data and user behavior data. For example, a system determines latent factors from the content items based on data including a user's text query or keyword query for a content item and the user's interaction with the content items based on the query (e.g., a user's click on a content item resulting from a …
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 22 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).