Machine-in-the-loop, image-to-video computer vision bootstrapping

US10740394B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10740394-B2
Application numberUS-201815941437-A
CountryUS
Kind codeB2
Filing dateMar 30, 2018
Priority dateJan 18, 2018
Publication dateAug 11, 2020
Grant dateAug 11, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems and methods for improving interactions with and between computers in content searching, hosting and/or providing systems supported by or configured with devices, servers and/or platforms. The disclosed systems and methods provide a novel machine-in-the-loop, image-to-video bootstrapping framework that harnesses a training set built upon an image dataset and a video dataset in order to efficiently produce an accurate training set to be applied to frames of videos. The disclosed systems and methods reduce the amount of time required to build the training dataset, and also provide mechanisms to apply the training dataset to any type of content and for any type of recognition task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising the steps of: receiving, at a computing device, a search query comprising a search term; searching, via the computing device, a collection of images, and based on said searching, identifying a set of images, said set of images comprising content depicting said search term; searching, via the computing device, a collection of videos, and based on said searching, identifying a set of videos, each video in said set of videos comprising at least one video frame comprising content depicting said search term; executing, via the computing device, object detection software on said image set and said video set, said execution comprising analyzing the image set and identifying information related to said content that depicts said search term within each image in the image set, and based on said analysis, performing visual object detection on frames of the videos in the video set based on the identified information from said image set; generating, via the computing device, a set of annotated video frames based on said visual object detection, said generation comprising annotating video frames of the videos in the video set that comprise said content depicting said search term with information indicating that a depiction of said search term is depicted therein; and training, via the computing device, visual recognizer software with said generated set of annotated video frames. 2. The method of claim 1 , further comprising: searching said collection of videos, and based on said searching, identifying a second video set of videos, each video in said second video set comprising at least one video frame comprising content depicting said search term; executing said object detection software on said second video set and said set of annotated video frames, said execution comprising performing visual object detection on frames of the videos in the second video set based on the annotated information in said annotated video frame set; generating a second set of annotated video frames based on said visual object detection, said generation comprising annotating a set of video frames of the videos in the second video set that comprise said content depicting said search term with information indicating that a depiction of said search term is depicted therein; and adding said second set of annotated video frames to a training dataset comprising the annotated video frames. 3. The method of claim 2 , further comprising training the visual recognizer software based on said addition of the second set of annotated video frames to the training dataset. 4. The method of claim 1 , further comprising: causing a video file to be rendered over a network on a device of a user; analyzing the video file as it is rendered on the user device, said analysis comprising identifying a frame set of the video that is currently being rendered; applying the trained visual recognizer software to said identified frame set; and identifying, based on said application of the trained visual recognizer software, an object depicted within said frame set that corresponds to said search term. 5. The method of claim 4 , further comprising: searching, over a network, for content associated with said object; identifying, based on said search, said content; and communicating said content for display when said object is displayed within said video said content display comprising information augmenting a depiction of the object within said video. 6. The method of claim 1 , further comprising: sampling each of the videos identified in said video set, and based on said sampling, identifying a frame set for each of the videos in said video set. 7. The method of claim 6 , wherein said sampling comprises applying neural network region proposal software on said videos in said video set. 8. The method of claim 1 , further comprising: determining a confidence value for each annotated video frame, said confidence value indicating a quality of the object in each video frame. 9. The method of claim 8 , wherein said annotated video frame is automatically added to a training dataset when said confidence value for said frame satisfies a threshold. 10. The method of claim 8 , wherein said annotated video frame is verified by an editor when said confidence value does not satisfy a threshold, wherein said annotated video frame is added to a training dataset after said verification. 11. The method of claim 1 , further comprising: downloading and storing said image set upon identifying said image set from said image search; and downloading and storing said video set upon identifying said video set from said video search. 12. A non-transitory computer-readable storage medium tangibly encoded with computer-executable instructions, that when executed by a processor associated with a computing device, performs a method comprising: receiving, at the computing device, a search query comprising a search term; searching, via the computing device, a collection of images, and based on said searching, identifying a set of images, said set of images comprising content depicting said search term; searching, via the computing device, a collection of videos, and based on said searching, identifying a set of videos, each video in said set of videos comprising at least one video frame comprising content depicting said search term; executing, via the computing device, object detection software on said image set and said video set, said execution comprising analyzing the image set and identifying information related to said content that depicts said search term within each image in the image set, and based on said analysis, performing visual object detection on frames of the videos in the video set based on the identified information from said image set; generating, via the computing device, a set of annotated video frames based on said visual object detection, said generation comprising annotating video frames of the videos in the video set that comprise said content depicting said search term with information indicating that a depiction of said search term is depicted therein; and training, via the computing device, visual recognizer software with said generated set of annotated video frames. 13. The non-transitory computer-readable storage medium of claim 12 , further comprising: searching said collection of videos, and based on said searching, identifying a second video set of videos, each video in said second video set comprising at least one video frame comprising content depicting said search term; executing said object detection software on said second video set and said set of annotated video frames, said execution comprising performing visual object detection on frames of the videos in the second video set based on the annotated information in said annotated video frame set; generating a second set of annotated video frames based on said visual object detection, said generation comprising annotating a set of video frames of the videos in the second video set that comprise said content depicting said search term with information indicating that a depiction of said search term is depicted therein; and adding said second set of annotated video frames to a training dataset comprising the annotated video frames. 14. The non-transitory computer-readable storage medium of claim 13 , further comprising training the visual recognizer software based on said addition of the second set of annotated video frames to the training dataset. 15. The non-transitory computer-readable storage medium of claim 12 , further comprising: causing a video

Assignees

Inventors

Classifications

  • using objects detected or recognised in the video content · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Validation; Performance evaluation · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10740394B2 cover?
Disclosed are systems and methods for improving interactions with and between computers in content searching, hosting and/or providing systems supported by or configured with devices, servers and/or platforms. The disclosed systems and methods provide a novel machine-in-the-loop, image-to-video bootstrapping framework that harnesses a training set built upon an image dataset and a video dataset…
Who is the assignee on this patent?
Oath Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/7837. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).