System and method for large-scale multi-label learning using incomplete label assignments
US-2016140451-A1 · May 19, 2016 · US
US9785866B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9785866-B2 |
| Application number | US-201514602524-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 22, 2015 |
| Priority date | Jan 22, 2015 |
| Publication date | Oct 10, 2017 |
| Grant date | Oct 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for optimizing multi-class image classification by leveraging negative multimedia data items to train and update classifiers are described. The techniques describe accessing positive multimedia data items of a plurality of multimedia data items, extracting features from the positive multimedia data items, and training classifiers based at least in part on the features. The classifiers may include a plurality of model vectors each corresponding to one of the individual labels. The system may iteratively test the classifiers using positive multimedia data and negative multimedia data and may update one or more model vectors associated with the classifiers differently, depending on whether multimedia data items are positive or negative. Techniques for applying the classifiers to determine whether a new multimedia data item is associated with a topic based at least in part on comparing similarity values with corresponding statistics derived from classifier training are also described.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein: individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and the negative multimedia data items are not associated with any label of the plurality of labels; extracting a first set of features from the individual positive multimedia data items; training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels; based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels; extracting a second set of features from a new multimedia data item; applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels; determining that the new multimedia data item is one of the negative multimedia data items; based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors. 2. A method as claim 1 recites, further comprising: receiving a second new multimedia data item associated with a first label of the plurality of labels; extracting a third set of features from the second new multimedia data item; applying the classifier to the third set of features; based at least in part on applying the classifier to the third set of features, determining new similarity values corresponding to each of the individual labels; determining that the second new multimedia data item is one of the positive multimedia data items; determining that the classifier classified the second new multimedia data item as being associated with a second label of the plurality of labels, the second label being different from the first label; and adjusting at least two of the individual model vectors. 3. A method as claim 2 recites, wherein adjusting at least two of the individual model vectors comprises: scaling down a first individual model vector of the individual model vectors, the first individual model vector associated with the second label; and scaling up a second individual model vector of the individual model vectors, the second individual model vector associated with the first label. 4. A method as claim 2 recites, further comprising updating the statistics based at least in part on determining the new similarity values. 5. A method as claim 1 recites, wherein updating the individual model vectors comprises: determining that a particular similarity value of the similarity values that corresponds to a particular individual label of the individual labels is greater than a particular statistic of the statistics associated with the particular individual label; and scaling down a particular individual model vector of the individual model vectors, the particular individual model vector corresponding to the particular individual label. 6. A method as claim 1 recites, wherein the statistics comprise one or more of: averages of the similarity values generated when the classifier correctly identifies an individual positive multimedia data item of the individual positive multimedia data items with an individual label of the individual labels; standard deviations of the similarity values generated when the classifier correctly identifies the individual positive multimedia data item with the individual label; kth order statistics of the similarity values generated when the classifier correctly identifies the individual positive multimedia data item with the individual label; or distributions representative of the similarity values generated when the classifier correctly identifies the individual positive multimedia data item with the individual label. 7. A method as claim 1 recites, wherein: the statistics comprise threshold values; and updating the individual model vectors is based at least in part on the similarity values being above the threshold values. 8. A method as claim 1 recites, wherein the classifier is a multi-class support vector machine. 9. A system comprising: one or more processors; and instructions stored in computer storage media executable by the one or more processors to perform operations comprising: accessing a corpus of multimedia data items, the corpus of multimedia data items including positive multimedia data items and negative multimedia data items, wherein: individual positive multimedia data items of the positive multimedia data items are associated with individual labels of a plurality of labels; and the negative multimedia data items are not associated with any label of the plurality of labels; extracting a first set of features from the individual positive multimedia data items; training a classifier based at least in part on the first set of features, the classifier including a plurality of model vectors each corresponding to one of the individual labels; based at least in part on applying the classifier to one or more of the individual positive multimedia data items, collecting statistics corresponding to each of the individual labels; extracting a second set of features from a new multimedia data item; applying the classifier to the second set of features to determine similarity values corresponding to each of the individual labels; determining that the new multimedia data item is one of the negative multimedia data items; based at least in part on determining that the new multimedia data item is one of the negative multimedia data items, comparing the statistics with the similarity values corresponding to each of the individual labels; and based at least in part on comparing the statistics with the similarity values, updating individual model vectors of the plurality of model vectors. 10. A system as claim 9 recites, the operations further comprising: receiving a second new multimedia data item associated with a first label of the plurality of labels; extracting a third set of features from the second new multimedia data item; applying the classifier to the third set of features; based at least in part on applying the classifier to the third set of features, determining new similarity values corresponding to each of the individual labels; determining that the second new multimedia data item is one of the positive multimedia data items; determining that the classifier classified the second new multimedia data item as being associated with a second label of the plurality of labels, the second label being different from the first label; and adjusting at least two of the individual model vectors. 11. A system as claim 10 recites, wherein adjusting at least two of the individual model vectors comprises: scaling down a first individual model vector of the individual model vectors, the first individual model vector associated with the second label; and scaling up a second individual model vector of the individual model vectors, the second individual model vector associated with the first label. 12. A system as claim 10 recites, the operations further comprising updating the statistics based at least in pa
Classification techniques · CPC title
Machine learning · CPC title
based on the proximity to a decision surface, e.g. support vector machines · CPC title
Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title
Clustering techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.