Automatic personalized image-based search
US-2024211508-A1 · Jun 27, 2024 · US
US9082047B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9082047-B2 |
| Application number | US-201313971092-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 20, 2013 |
| Priority date | Aug 20, 2013 |
| Publication date | Jul 14, 2015 |
| Grant date | Jul 14, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for learning visual attribute labels for images includes, from textual comments associated with a corpus of images, identifying a set of candidate textual labels that are predictive of aesthetic scores associated with images in the corpus. The candidate labels in the set are clustered into a plurality of visual attribute clusters based on similarity and each of the clusters assigned a visual attribute label. For each of the visual attribute labels, a classifier is trained using visual representations of images in the corpus and respective visual attribute labels. The visual attribute labels are evaluated, based on performance of the trained classifier. A subset of the visual attribute labels is retained, based on the evaluation. The visual attribute labels can be used in processes such as image retrieval, image labeling, and the like.
Opening claim text (preview).
What is claimed is: 1. A method for learning visual attribute labels for images comprising: from textual comments associated with a corpus of images, identifying a set of candidate textual labels that are predictive of aesthetic scores associated with images in the corpus; clustering the candidate labels in the set into a plurality of visual attribute clusters based on similarity and assigning each of the clusters a visual attribute label; for each of the visual attribute labels, training a visual attribute classifier using visual representations of images in the corpus and respective visual attribute labels; evaluating the visual attribute labels based on performance of the trained visual attribute classifiers, the evaluating comprising comparing performance of each of the visual attribute classifiers with a predefined threshold; and retaining a subset of the visual attribute labels based on the evaluation, including retaining the visual attribute labels for the visual attribute classifiers that meet the performance threshold; wherein at least one of the identifying a set of candidate textual labels, clustering the candidate labels, training the classifier, and evaluating the classifier performance is performed with a processor. 2. The method of claim 1 , wherein the identifying of the set of candidate textual labels comprises, for each image, generating a text-based representation of a set of textual features from which the candidate labels are selected. 3. The method of claim 2 , wherein the text-based representation of the image is based on the textual features extracted from textual comments associated with the image. 4. The method of claim 3 , wherein the identifying of the set of candidate textual labels comprises, for each image in the corpus, generating a document based on textual comments associated with the image in which stop words and punctuation have been removed to generate a sequence of words, and from the sequence of words, extracting a set of textual features, the text-based representation being based on occurrence of the textual features in the document. 5. The method of claim 2 , wherein each of the textual features in the text-based representation corresponds to a respective sequence of at least one word extracted from the textual comments associated with the corpus of images. 6. The method of claim 5 , wherein at least some of the textual features represented in the text-based representation each correspond to a bigram. 7. The method of claim 2 , wherein the identifying of the set of candidate textual labels comprises: optimizing a regression function that outputs a regression coefficient for each the textual features represented in the text-based representation; and ranking at least some of the textual features based on the regression coefficients. 8. The method of claim 7 , wherein the regression function is an Elastic Net. 9. The method of claim 1 , wherein the assigning each of the clusters a visual attribute label comprises selecting one of the textual features assigned to the cluster as the visual attribute label. 10. The method of claim 1 , wherein each of the visual representations comprises a statistical representation of low level features extracted from patches of the respective image. 11. The method of claim 1 , further comprising, with the trained classifiers, assigning visual attribute labels to a query image based on a visual representation of the query image. 12. The method of claim 1 , further comprising receiving one of the retained visual attribute labels as a query and retrieving images from a collection of images that are labeled with visual attribute labels selected from the set of visual attribute labels. 13. A computer program product comprising a non-transitory storage medium which stores instructions, which when executed by a computer, performs the method of claim 1 . 14. A system comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions. 15. A method for learning visual attribute labels for images comprising: from textual comments associated with a corpus of images, identifying a set of candidate textual labels that are predictive of aesthetic scores associated with images in the corpus, the identifying comprising, for each image, generating a text-based representation of a set of textual features from which the candidate labels are selected, optimizing a regression function that outputs a regression coefficient for each the textual features represented in the text-based representation, the optimizing of the regression function is being based on the text-based representations for images in the corpus and respective aesthetic scores for the images, and ranking at least some of the textual features based on the regression coefficients; clustering the candidate labels in the set into a plurality of visual attribute clusters based on similarity and assigning each of the clusters a visual attribute label; for each of the visual attribute labels, training a classifier using visual representations of images in the corpus and respective visual attribute labels; evaluating the visual attribute labels based on performance of the trained classifier; and retaining a subset of the visual attribute labels based on the evaluation; wherein at least one of the identifying a set of candidate textual labels, clustering the candidate labels, training the classifier, and evaluating the classifier performance is performed with a processor. 16. The method of claim 15 , wherein the evaluating the visual attribute labels based on a performance criterion comprises comparing performance of each of the visual attribute classifiers with a predefined threshold and the retaining includes retaining the visual attribute labels for the visual attribute classifiers that meet the performance threshold. 17. A method for learning visual attribute labels for images comprising: from textual comments associated with a corpus of images, identifying a set of candidate textual labels that are predictive of aesthetic scores associated with images in the corpus, the identifying comprising, for each image, generating a text-based representation of a set of textual features from which the candidate labels are selected, optimizing a regression function that outputs a regression coefficient for each the textual features represented in the text-based representation, identifying a first group of positive regression coefficients and a second group of negative regression coefficients, the ranking of the at least some of the textual features comprising separately ranking the first and second groups, and ranking of the at least some of the textual features based on the regression coefficients, comprising separately ranking the first and second groups; clustering the candidate labels in the set into a plurality of visual attribute clusters based on similarity and assigning each of the clusters a visual attribute label; for each of the visual attribute labels, training a classifier using visual representations of images in the corpus and respective visual attribute labels; evaluating the visual attribute labels based on performance of the trained classifier; and retaining a subset of the visual attribute labels based on the evaluation; wherein at least one of the identifying a set of candidate textual labels, clustering the candidate labels, training the classifier, and evaluating the classifier performance is performed with a processor.
based on feedback from supervisors · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
using classification, e.g. of video objects · CPC title
based on distances to training or reference patterns · CPC title
of still image data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.