Multi-tracker object tracking
US-2015055821-A1 · Feb 26, 2015 · US
US9734587B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9734587-B2 |
| Application number | US-201514871955-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2015 |
| Priority date | Sep 30, 2015 |
| Publication date | Aug 15, 2017 |
| Grant date | Aug 15, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some implementations, a computing device can track an object from a first image frame to a second image frame using a self-correcting tracking method. The computing device can select points of interest in the first image frame. The computing device can track the selected points of interest from the first image frame to the second image frame using optical flow object tracking. The computing device can prune the matching pairs of points and generate a transform based on the remaining matching pairs to detect the selected object in the second image frame. The computing device can generate a tracking confidence metric based on a projection error for each point of interest tracked from the first frame to the second frame. The computing device can correct tracking errors by reacquiring the object when the tracking confidence metric is below a threshold value.
Opening claim text (preview).
What is claimed is: 1. A method comprising: one or more processors; and receiving, by a computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest point in the first frame and a corresponding predicted point in the second frame, where the predicted point in the second frame is determined by tracking the particular interest point from the first frame to the corresponding predicted point in the second frame; pruning, by the computing device, one or more point pairs in the matching point pairs by clustering the matching point pairs and removing matching point pairs that do not belong to a cluster having a largest number of matching point pairs; and tracking, by the computing device, the object in the first frame to the second frame based on the interest points in the matching point pairs that belong to the largest matching point pair cluster; wherein the non-maximum suppression filter is a double-pass non-maximum suppression filter that uses two concentric radii to filter interest points on a first pass through the selected interest points and a single radius to filter interest points on a second pass through the selected interest points. 2. The method of claim 1 , wherein the pruning comprises: determining a distance between the interest point and the corresponding predicted point in each of the corresponding pairs in the matching point pairs; and clustering the matching point pairs by performing k-means clustering based on distances corresponding to each matching point pair to generate two or more clusters of matching point pairs. 3. The method of claim 1 , further comprising: generating a global transformation matrix which maps the interest points in the first frame that are included in the largest matching point pair cluster to corresponding points in the second frame. 4. The method of claim 3 , further comprising: calculating an average point projection error for the first frame, where the projected error is a distance between a predicted point and an actual point determined using the global transformation matrix; generating a confidence value based on the global transformation matrix and the average projected error for the first frame. 5. The method of claim 4 , further comprising: presenting a graphical representation of the confidence value calculated for the first frame of the video on a display of the computing device. 6. The method of claim 4 , further comprising: determining that the confidence value is below a threshold value; and in response to determining that the confidence value is below a threshold value, reacquiring the object in the second frame by performing an object recognition method using a scale invariant feature transform descriptor. 7. A non-transitory computer-readable medium including one or more sequences of instructions that, when executed by one or more processors, causes: receiving, by a computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest point in the first frame and a corresponding predicted point in the second frame, where the predicted point in the second frame is determined by tracking the particular interest point from the first frame to the corresponding predicted point in the second frame; pruning, by the computing device, one or more point pairs in the matching point pairs by clustering the matching point pairs and removing matching point pairs that do not belong to a cluster having a largest number of matching point pairs; and tracking, by the computing device, the object in the first frame to the second frame based on the interest points in the matching point pairs that belong to the largest matching point pair cluster; wherein the non-maximum suppression filter is a double-pass non-maximum suppression filter that uses two concentric radii to filter interest points on a first pass through the selected interest points and a single radius to filter interest points on a second pass through the selected interest points. 8. The non-transitory computer-readable medium of claim 7 , wherein the instructions that cause pruning include instructions that cause: determining a distance between the interest point and the corresponding predicted point in each of the corresponding pairs in the matching point pairs; and clustering the matching point pairs by performing k-means clustering based on distances corresponding to each matching point pair to generate two or more clusters of matching point pairs. 9. The non-transitory computer-readable medium of claim 7 , wherein the instructions cause: generating a global transformation matrix which maps the interest points in the first frame that are included in the largest matching point pair cluster to corresponding points in the second frame. 10. The non-transitory computer-readable medium of claim 9 , wherein the instructions cause: calculating an average point projection error for the first frame, where the projected error is a distance between a predicted point and an actual point determined using the global transformation matrix; generating a confidence value based on the global transformation matrix and the average projected error for the first frame. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause: presenting a graphical representation of the confidence value calculated for the first frame of the video on a display of the computing device. 12. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause: determining that the confidence value is below a threshold value; and in response to determining that the confidence value is below a threshold value, reacquiring the object in the second frame by performing an object recognition method using a scale invariant feature transform descriptor. 13. A computing device comprising: one or more processors; and a non-transitory computer-readable medium including one or more sequences of instructions that, when executed by the one or more processors, causes: receiving, by the computing device, user input identifying an image area of a first frame of a video, the image area including an object to be tracked from the first frame to a second frame of the video by the computing device; selecting, by the computing device, interest points within the image area; filtering, by the computing device, the selected interest points using a non-maximum suppression filter; generating, by the computing device, matching point pairs based on the filtered interest points, where each point pair in the matching point pairs includes a first selected interest, point in the first frame and a corresponding predicted point in the seco
using feature-based methods, e.g. the tracking of corners or segments · CPC title
involving reference images or patches · CPC title
Matching criteria, e.g. proximity measures · CPC title
with fixed number of clusters, e.g. K-means clustering · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.