Clustering historical images using a convolutional neural net and labeled data bootstrapping

US10318846B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10318846-B2
Application numberUS-201615393008-A
CountryUS
Kind codeB2
Filing dateDec 28, 2016
Priority dateDec 28, 2016
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for classifying historical images. A feature extractor may create feature vectors corresponding to a plurality of images. A first classification of the plurality of images may be performed based on the plurality of feature vectors, which may include assigning a label to each of the plurality of images and assigning a probability for each of the assigned labels. The assigned probability for each of the assigned labels may be related to a statistical confidence that a particular assigned label is correctly assigned to a particular image. A subset of the plurality of images may be displayed to a display device. An input corresponding to replacement of an incorrect label with a corrected label for a certain image may be received from a user. A second classification of the plurality of images based on the input from the user may be performed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for classifying a plurality of images comprising: creating, by a feature extractor, a plurality of feature vectors corresponding to the plurality of images; performing, by a feature classifier, a first classification of the plurality of images based on the plurality of feature vectors, wherein performing the first classification includes: assigning at least one of a plurality of labels to each of the plurality of images; and assigning a first probability for each of the assigned labels, wherein the assigned first probability for each of the assigned labels is related to a statistical confidence that a particular assigned label is correctly assigned to a particular image; determining a subset of probabilities of the assigned first probabilities, wherein the subset of probabilities includes all of the assigned first probabilities that are less than an upper probability threshold; determining a subset of the plurality of images corresponding to the subset of probabilities; outputting, to a display device, the subset of the plurality of images corresponding to the subset of probabilities; receiving user input corresponding to replacement of an incorrect label with a corrected label for a certain image of the subset of the plurality of images; receiving user input corresponding to a confidence level associated with the corrected label; adjusting the feature classifier using the corrected label and the confidence level associated with the corrected label; and performing, by the adjusted feature classifier, a second classification of the plurality of images based on the plurality of feature vectors, wherein performing the second classification includes: assigning at least one of the plurality of labels to each of the plurality of images, including assigning the corrected label to the certain image; and assigning a second probability for each of the assigned labels. 2. The method of claim 1 , wherein the feature extractor is a convolutional neural network (CNN), the CNN having been previously trained and the CNN being compatible with the plurality of images such that the plurality of images are receivable as inputs by the CNN. 3. The method of claim 1 , wherein the plurality of images are historical images. 4. The method of claim 1 , further comprising: determining a second subset of probabilities of the assigned second probabilities; determining a second subset of the plurality of images corresponding to the second subset of probabilities; outputting, to the display device, the second subset of the plurality of images; and receiving user input corresponding to replacement of a second incorrect label with a second corrected label for a second certain image of the second subset of the plurality of images. 5. The method of claim 1 , wherein each of the plurality of feature vectors comprise 4096 numbers. 6. The method of claim 1 , wherein the subset of probabilities includes all of the assigned first probabilities that are greater than a lower probability threshold and less than the upper probability threshold. 7. The method of claim 1 , further comprising: receiving user input corresponding to creation of a new label, wherein the new label is added to the plurality of labels. 8. A computer readable storage media comprising instructions to cause one or more processors to perform operations comprising: creating, by a feature extractor, a plurality of feature vectors corresponding to the plurality of images; performing, by a feature classifier, a first classification of the plurality of images based on the plurality of feature vectors, wherein performing the first classification includes: assigning at least one of a plurality of labels to each of the plurality of images; and assigning a first probability for each of the assigned labels, wherein the assigned first probability for each of the assigned labels is related to a statistical confidence that a particular assigned label is correctly assigned to a particular image; determining a subset of probabilities of the assigned first probabilities, wherein the subset of probabilities includes all of the assigned first probabilities that are less than an upper probability threshold; determining a subset of the plurality of images corresponding to the subset of probabilities; outputting, to a display device, the subset of the plurality of images corresponding to the subset of probabilities; receiving user input corresponding to replacement of an incorrect label with a corrected label for a certain image of the subset of the plurality of images; receiving user input corresponding to a confidence level associated with the corrected label; adjusting the feature classifier using the corrected label and the confidence level associated with the corrected label; and performing, by the adjusted feature classifier, a second classification of the plurality of images based on the plurality of feature vectors, wherein performing the second classification includes: assigning at least one of the plurality of labels to each of the plurality of images, including assigning the corrected label to the certain image; and assigning a second probability for each of the assigned labels. 9. The computer readable storage media of claim 8 , wherein the feature extractor is a convolutional neural network (CNN), the CNN having been previously trained and the CNN being compatible with the plurality of images such that the plurality of images are receivable as inputs by the CNN. 10. The computer readable storage media of claim 8 , wherein the plurality of images are historical images. 11. The computer readable storage media of claim 8 , further comprising instructions to cause one or more processors to perform operations further comprising: determining a second subset of probabilities of the assigned second probabilities; determining a second subset of the plurality of images corresponding to the second subset of probabilities; outputting, to the display device, the second subset of the plurality of images; and receiving user input corresponding to replacement of a second incorrect label with a second corrected label for a second certain image of the second subset of the plurality of images. 12. The computer readable storage media of claim 8 , wherein each of the plurality of feature vectors comprise 4096 numbers. 13. The computer readable storage media of claim 8 , wherein the subset of probabilities includes all of the assigned first probabilities that are greater than a lower probability threshold and less than the upper probability threshold. 14. The computer readable storage media of claim 8 , further comprising instructions to cause one or more processors to perform operations further comprising: receiving user input corresponding to creation of a new label, wherein the new label is added to the plurality of labels. 15. A system for classifying a plurality of images, the system comprising: one or more processors; a display device in communication with the one or more processors; one or more computer readable storage mediums comprising instructions to cause the one or more processors to perform operations comprising: creating, by a feature extractor, a plurality of feature vectors corresponding to the plurality of images; performing, by the feature classifier, a first classification of the plurality of images based on the plurality of feature vectors, wherein performing the first classification includes: assigning at least one of a plurality of labels to each of the plurality of images; and assigning a first prob

Assignees

Inventors

Classifications

  • the supervisor being a human, e.g. interactive learning with a human teacher · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Interactive pattern learning with a human teacher · CPC title

  • G06K9/6254Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10318846B2 cover?
Systems and methods for classifying historical images. A feature extractor may create feature vectors corresponding to a plurality of images. A first classification of the plurality of images may be performed based on the plurality of feature vectors, which may include assigning a label to each of the plurality of images and assigning a probability for each of the assigned labels. The assigned …
Who is the assignee on this patent?
Ancestry Com Operations Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).