Incremental learning framework for object detection in videos

US9805264B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9805264-B2
Application numberUS-201514887141-A
CountryUS
Kind codeB2
Filing dateOct 19, 2015
Priority dateOct 19, 2015
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques disclose an incrementally expanding object detection model. An object detection tool identifies, based on an object detection model, one or more objects in a sequence of video frames. The object detection model provides an object space including a plurality of object classes. Each object class includes one or more prototypes. Each object is classified as being an instance of one of the object classes. Each identified object is tracked across at least one of the frames. The object detection tool generates a measure of confidence for that object based on the tracking. Upon determining that the measure of confidence exceeds a threshold, the object detection tool adds a prototype of the instance to the object detection model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: identifying, based on an object detection model, one or more objects in a first sequence of video frames of a plurality of sequences of video frames, wherein the object detection model provides an object space including a plurality of object classes, wherein each object is classified as being an instance of one of the classes, and wherein each object class includes one or more prototypes representing an instance of an object associated with the object class; and for each identified object: tracking the object across at least one of the frames, generating a measure of confidence for the object based on the tracking, wherein the measure of confidence indicates a degree that the object does not correspond to any of the one or more prototypes currently associated with the object class, and upon determining that the measure of confidence exceeds a threshold, adding a prototype representative of the instance to the object detection model. 2. The method of claim 1 , further comprising, upon determining that the measure of confidence does not exceed the threshold: identifying one of the prototypes that the object corresponds to; and reinforcing the identified object. 3. The method of claim 1 , further comprising: for each successive sequence of video frames in the plurality of sequences, updating the object detection model based on one or more objects in the sequence of video frames identified based on the object detection model. 4. The method of claim 1 , wherein the object detection model is a large margin embedding (LME)-based model. 5. The method of claim 1 , wherein tracking the object across the frames comprises performing Kanade-Lucas-Tomasi feature tracking on the object. 6. The method of claim 1 , wherein identifying the one or more objects in the first sequence of video frames comprises: identifying at least a first object proposal; determining a probability score for the first object proposal based on a selective search and feature vectors corresponding to the first object proposal; and upon determining that the first object proposal has a probability score exceeding a second threshold, adding the first object proposal to a set of detected objects. 7. The method of claim 1 , further comprising: initializing the object detection model with a plurality of labeled images. 8. A non-transitory computer-readable storage medium having instructions, which, when executed on a processor, perform an operation comprising: identifying, based on an object detection model, one or more objects in a first sequence of video frames of a plurality of sequences of video frames, wherein the object detection model provides an object space including a plurality of object classes, wherein each object is classified as being an instance of one of the object classes, and wherein each object class includes one or more prototypes representing an instance of an object associated with the object class; and for each identified object: tracking the object across at least one of the frames, generating a measure of confidence for the object based on the tracking, wherein the measure of confidence indicates a degree that the object does not correspond to any of the one or more prototypes currently associated with the object class, and upon determining that the measure of confidence exceeds a threshold, adding a prototype representative of the instance to the object detection model. 9. The computer-readable storage medium of claim 8 , wherein the operation further comprises, upon determining that the measure of confidence does not exceed the threshold: identifying one of the prototypes that the object corresponds to; and reinforcing the identified object. 10. The computer-readable storage medium of claim 8 , wherein the operation further comprises: for each successive sequence of video frames in the plurality of sequences, updating the object detection model based on one or more objects in the sequence of video frames identified based on the object detection model. 11. The computer-readable storage medium of claim 8 , wherein the object detection model is a large margin embedding (LME)-based model. 12. The computer-readable storage medium of claim 8 , wherein tracking the object across the frames comprises performing Kanade-Lucas-Tomasi feature tracking on the object. 13. The computer-readable storage medium of claim 8 , wherein identifying the one or more objects in the first sequence of video frames comprises: identifying at least a first object proposal; determining a probability score for the first object proposal based on a selective search and feature vectors corresponding to the first object proposal; and upon determining that the first object proposal has a probability score exceeding a second threshold, adding the first object proposal to a set of detected objects. 14. The computer-readable storage medium of claim 8 , wherein the operation further comprises: initializing the object detection model with a plurality of labeled images. 15. A system, comprising: a processor; and a memory storing program code, which, when executed on the processor, performs an operation comprising: identifying, based on an object detection model, one or more objects in a first sequence of video frames of a plurality of sequences of video frames, wherein the object detection model provides an object space including a plurality of object classes, wherein each object is classified as being an instance of one of the object classes, and wherein each object class includes one or more prototypes representing an instance of an object associated with the object class, and for each identified object: tracking the object across at least one of the frames, generating a measure of confidence for the object based on the tracking, wherein the measure of confidence indicates a degree that the object does not correspond to any of the one or more prototypes currently associated with the object class, and upon determining that the measure of confidence exceeds a threshold, adding a prototype representative of the instance to the object detection model. 16. The system of claim 15 , wherein the operation further comprises, upon determining that the measure of confidence does not exceed the threshold: identifying one of the prototypes that the object corresponds to; and reinforcing the identified object. 17. The system of claim 15 , wherein the operation further comprises: for each successive sequence of video frames in the plurality of sequences, updating the object detection model based on one or more objects in the sequence of video frames identified based on the object detection model. 18. The system of claim 15 , wherein the object detection model is a large margin embedding (LME)-based model. 19. The system of claim 15 , wherein identifying the one or more objects in the first sequence of video frames comprises: identifying at least a first object proposal; determining a probability score for the first object proposal based on a selective search and feature vectors corresponding to the first object proposal; and upon determining that the first object proposal has a probability score exceeding a second threshold, adding the first object proposal to a set of detected objects. 20. The system of claim 15 , wherein the operation further comprises: initializing the object detection model with a plurality of labeled images.

Assignees

Inventors

Classifications

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title

  • G06V10/255Primary

    Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title

  • based on discrimination criteria, e.g. discriminant analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9805264B2 cover?
Techniques disclose an incrementally expanding object detection model. An object detection tool identifies, based on an object detection model, one or more objects in a sequence of video frames. The object detection model provides an object space including a plurality of object classes. Each object class includes one or more prototypes. Each object is classified as being an instance of one of t…
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/255. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).