Long term object tracker

US9734587B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9734587-B2
Application numberUS-201514871955-A
CountryUS
Kind codeB2
Filing dateSep 30, 2015
Priority dateSep 30, 2015
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some implementations, a computing device can track an object from a first image frame to a second image frame using a self-correcting tracking method. The computing device can select points of interest in the first image frame. The computing device can track the selected points of interest from the first image frame to the second image frame using optical flow object tracking. The computing device can prune the matching pairs of points and generate a transform based on the remaining matching pairs to detect the selected object in the second image frame. The computing device can generate a tracking confidence metric based on a projection error for each point of interest tracked from the first frame to the second frame. The computing device can correct tracking errors by reacquiring the object when the tracking confidence metric is below a threshold value.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: one or more processors; and receiving, by a computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest point in the first frame and a corresponding predicted point in the second frame, where the predicted point in the second frame is determined by tracking the particular interest point from the first frame to the corresponding predicted point in the second frame; pruning, by the computing device, one or more point pairs in the matching point pairs by clustering the matching point pairs and removing matching point pairs that do not belong to a cluster having a largest number of matching point pairs; and tracking, by the computing device, the object in the first frame to the second frame based on the interest points in the matching point pairs that belong to the largest matching point pair cluster; wherein the non-maximum suppression filter is a double-pass non-maximum suppression filter that uses two concentric radii to filter interest points on a first pass through the selected interest points and a single radius to filter interest points on a second pass through the selected interest points. 2. The method of claim 1 , wherein the pruning comprises: determining a distance between the interest point and the corresponding predicted point in each of the corresponding pairs in the matching point pairs; and clustering the matching point pairs by performing k-means clustering based on distances corresponding to each matching point pair to generate two or more clusters of matching point pairs. 3. The method of claim 1 , further comprising: generating a global transformation matrix which maps the interest points in the first frame that are included in the largest matching point pair cluster to corresponding points in the second frame. 4. The method of claim 3 , further comprising: calculating an average point projection error for the first frame, where the projected error is a distance between a predicted point and an actual point determined using the global transformation matrix; generating a confidence value based on the global transformation matrix and the average projected error for the first frame. 5. The method of claim 4 , further comprising: presenting a graphical representation of the confidence value calculated for the first frame of the video on a display of the computing device. 6. The method of claim 4 , further comprising: determining that the confidence value is below a threshold value; and in response to determining that the confidence value is below a threshold value, reacquiring the object in the second frame by performing an object recognition method using a scale invariant feature transform descriptor. 7. A non-transitory computer-readable medium including one or more sequences of instructions that, when executed by one or more processors, causes: receiving, by a computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest point in the first frame and a corresponding predicted point in the second frame, where the predicted point in the second frame is determined by tracking the particular interest point from the first frame to the corresponding predicted point in the second frame; pruning, by the computing device, one or more point pairs in the matching point pairs by clustering the matching point pairs and removing matching point pairs that do not belong to a cluster having a largest number of matching point pairs; and tracking, by the computing device, the object in the first frame to the second frame based on the interest points in the matching point pairs that belong to the largest matching point pair cluster; wherein the non-maximum suppression filter is a double-pass non-maximum suppression filter that uses two concentric radii to filter interest points on a first pass through the selected interest points and a single radius to filter interest points on a second pass through the selected interest points. 8. The non-transitory computer-readable medium of claim 7 , wherein the instructions that cause pruning include instructions that cause: determining a distance between the interest point and the corresponding predicted point in each of the corresponding pairs in the matching point pairs; and clustering the matching point pairs by performing k-means clustering based on distances corresponding to each matching point pair to generate two or more clusters of matching point pairs. 9. The non-transitory computer-readable medium of claim 7 , wherein the instructions cause: generating a global transformation matrix which maps the interest points in the first frame that are included in the largest matching point pair cluster to corresponding points in the second frame. 10. The non-transitory computer-readable medium of claim 9 , wherein the instructions cause: calculating an average point projection error for the first frame, where the projected error is a distance between a predicted point and an actual point determined using the global transformation matrix; generating a confidence value based on the global transformation matrix and the average projected error for the first frame. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause: presenting a graphical representation of the confidence value calculated for the first frame of the video on a display of the computing device. 12. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause: determining that the confidence value is below a threshold value; and in response to determining that the confidence value is below a threshold value, reacquiring the object in the second frame by performing an object recognition method using a scale invariant feature transform descriptor. 13. A computing device comprising: one or more processors; and a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, causes: receiving, by the computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest, point in the first frame and a corresponding predicted point in the seco

Assignees

Inventors

Classifications

  • G06T7/246Primary

    using feature-based methods, e.g. the tracking of corners or segments · CPC title

  • involving reference images or patches · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • with fixed number of clusters, e.g. K-means clustering · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9734587B2 cover?
In some implementations, a computing device can track an object from a first image frame to a second image frame using a self-correcting tracking method. The computing device can select points of interest in the first image frame. The computing device can track the selected points of interest from the first image frame to the second image frame using optical flow object tracking. The computing …
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/246. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).