Generating labeled images

US9852363B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9852363-B1
Application numberUS-201614987955-A
CountryUS
Kind codeB1
Filing dateJan 5, 2016
Priority dateSep 27, 2012
Publication dateDec 26, 2017
Grant dateDec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled images. One of the methods includes selecting a plurality of candidate videos from videos identified in a response to a search query derived from a label for an object category; selecting one or more initial frames from each of the candidate videos; detecting one or more initial images of objects in the object category in the initial frames; for each initial frame including an initial image of an object in the object category, tracking the object through surrounding frames to identify additional images of the object; and selecting one or more images from the one or more initial images and one or more additional images as database images of objects belonging to the object category.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating labeled images of objects belonging to a particular object category, the method comprising: obtaining data identifying the particular object category; identifying database images of the objects belonging to the particular object category, comprising: generating a search query derived from a label for the particular object category; selecting a plurality of candidate videos from videos identified in response to submission of the search query; selecting one or more initial frames from each of the candidate videos; detecting one or more initial images of objects belonging to the particular object category in the one or more initial frames; for each initial frame including an initial image of an object belonging to the particular object category, tracking the object through surrounding frames to identify one or more additional images of the object; and selecting one or more images from the one or more initial images and one or more additional images as the database images of objects belonging to the particular object category; and storing the database images as images of objects belong to the particular object category. 2. The method of claim 1 , wherein storing the database images comprises storing the database images as training data for a machine learning model. 3. The method of claim 2 , further comprising: training, using the training data that includes the database images, the machine learning model to classify particular images in videos as being associated with the object category. 4. The method of claim 2 , further comprising: training, using the training data that includes the database images, the machine learning model to predict a context of images or videos. 5. The method of claim 2 , further comprising: training, using the training data that includes the database images, the machine learning model to receive, as an input, sequences of frames extracted from videos and predict other frames in the videos. 6. The method of claim 1 , further comprising: generating an additional search query derived from terms associated with the label for the particular object category; identifying one or more videos identified in response to submission of the additional search query; and merging the plurality of candidate videos and the one or more videos identified in response to submission of the additional search query to generate a second plurality of candidate videos. 7. The method of claim 6 , wherein merging the plurality of candidate videos and the one or more videos comprises: determining candidate videos that were included in both (i) the one or more videos identified in response to submission of the additional search query, and (ii) the videos identified in response to submission of the search query; and generating the second plurality of candidate videos by filtering videos from the plurality of candidate videos that were not included in both (i) the one or more videos identified in response to submission of the additional search query, and (ii) the videos identified in response to submission of the search query. 8. The method of claim 1 , wherein selecting the one or more images from the one or more initial images and the one or more additional images as the database images of objects belonging to the particular object category comprises: determining one or more subsets of the one or more initial images in the respective selected candidate video; and selecting subsets of the one or more initial images that (i) are spaced apart from each other by a determined number of images in the respective selected candidate videos, and (ii) satisfy a quality threshold for initial images. 9. The method of claim 1 , wherein for each initial frame including the initial image of the object belonging to the particular object category, tracking the object through surrounding frames to identify one or more additional images of the object comprises: applying an object tracker to a plurality of bounding boxes within the initial frame to track the initial image of the object belonging to the particular object category; generating scores for the plurality of bounding boxes in response to applying the object tracker; determining a detection score threshold value based on a fraction of previously-processed initial frames for which the highest-scoring bounding box has been found to satisfy a previous detection score threshold value being above a first threshold fraction; and selecting a particular bounding box of the plurality of bounding boxes having the highest score and satisfying the detection score threshold. 10. A system comprising: one or more processors and one or more computer storage media storing instructions that are operable and when executed by the one or more processors, cause the one or more processors to perform operations comprising: obtaining data identifying the particular object category; identifying database images of the objects belonging to the particular object category, comprising: generating a search query derived from a label for the particular object category; selecting a plurality of candidate videos from videos identified in response to submission of the search query; selecting one or more initial frames from each of the candidate videos; detecting one or more initial images of objects belonging to the particular object category in the one or more initial frames; for each initial frame including an initial image of an object belonging to the particular object category, tracking the object through surrounding frames to identify one or more additional images of the object; and selecting one or more images from the one or more initial images and one or more additional images as the database images of objects belonging to the particular object category; and storing the database images as images of objects belong to the particular object category. 11. The system of claim 10 , wherein storing the database images comprises storing the database images as training data for a machine learning model. 12. The system of claim 11 , wherein the operations further comprise training, using the training data that includes the database images, the machine learning model to: classify particular images in videos as being associated with the object category; predict a context of images or videos; or receive, as an input, sequences of frames extracted from videos and predict other frames in the videos. 13. The system of claim 10 , wherein the operations further comprise: generating an additional search query derived from terms associated with the label for the particular object category; identifying one or more videos identified in response to submission of the additional search query; and merging the plurality of candidate videos and the one or more videos identified in response to submission of the additional search query to generate a second plurality of candidate videos. 14. The system of claim 13 , wherein merging the plurality of candidate videos and the one or more videos comprises: determining candidate videos that were included in both (i) the one or more videos identified in response to submission of the additional search query, and (ii) the videos identified in response to submission of the search query; and generating the second plurality of candidate videos by filtering videos from the plurality of candidate videos that were not included in both (i) the one or more videos identified in response to submission of the additional search query, and (ii) the videos identified in response to submission of the search query.

Assignees

Inventors

Classifications

  • G06F16/739Primary

    in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames · CPC title

  • based on distances to training or reference patterns · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852363B1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled images. One of the methods includes selecting a plurality of candidate videos from videos identified in a response to a search query derived from a label for an object category; selecting one or more initial frames from each of the candidate videos; detecting one or more initia…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/739. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).