Efficient cnn-based solution for video frame interpolation
US-2020356827-A1 · Nov 12, 2020 · US
US12073567B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12073567-B2 |
| Application number | US-202117187831-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 28, 2021 |
| Priority date | Feb 27, 2020 |
| Publication date | Aug 27, 2024 |
| Grant date | Aug 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of analysing objects in a first frame and a second frame is disclosed. The method includes segmenting the frames, and matching at least one object in the first frame with a corresponding object in the second frame. The method optionally includes estimating the motion of the at least one matched object between the frames. Also disclosed is a method of generating a training dataset suitable for training machine learning algorithms to estimate the motion of objects. Also provided are processing systems configured to carry out these methods.
Opening claim text (preview).
What is claimed is: 1. A method of analyzing one or more objects in a set of frames comprising at least a first frame and a second frame, the method comprising: segmenting the first frame to produce a plurality of first masks, each first mask being a pixel map identifying pixels belonging to a potential object-instance detected in the first frame; for each potential object-instance detected in the first frame, extracting from the first frame a first feature vector characterising the potential object-instance; segmenting the second frame to produce a plurality of second masks, each second mask being a pixel map identifying pixels belonging to a potential object-instance detected in the second frame; for each potential object-instance detected in the second frame, extracting from the second frame a second feature vector characterising the potential object-instance; and matching at least one of the potential object-instances in the first frame with one of the potential object-instances in the second frame, based at least in part on the first feature vectors, the first masks, the second feature vectors and the second masks, wherein the matching comprises clustering the potential object-instances detected in the first and second frames, based at least in part on the first feature vectors and the second feature vectors, to generate clusters of potential object-instances. 2. The method of claim 1 , wherein the matching further comprises, for each cluster in each frame: evaluating a distance between the potential object-instances in the cluster in that frame; and splitting the cluster into multiple clusters based on a result of the evaluating. 3. The method of claim 1 , wherein the matching comprises selecting a single object-instance from among the potential object-instances in each cluster in each frame. 4. The method of claim 3 , wherein the matching comprises matching at least one of the single object-instances in the first frame with a single object-instance in the second frame. 5. The method of claim 1 , wherein the matching comprises rejecting potential object-instances based on any one or any combination of two or more of the following: an object confidence score, which estimates whether a potential object-instance is more likely to be an object or part of the background; a mask confidence score, which estimates a likelihood that a mask represents an object; and a mask area. 6. The method of claim 5 , wherein the mask confidence score is generated by a machine learning algorithm trained to predict a degree of correspondence between the mask and a ground truth mask. 7. The method of claim 1 , wherein the masks and feature vectors are generated by a first machine learning algorithm. 8. The method of claim 1 , further comprising for at least one matched object in the first frame and the second frame, estimating a motion of the object between the first frame and the second frame. 9. The method of claim 8 , wherein estimating the motion of the object comprises, for each of a plurality of pixels of the object: estimating a translational motion vector; estimating a non-translational motion vector; and calculating a motion vector of the pixel as the sum of the translational motion vector and the non-translational motion vector. 10. The method of claim 8 , wherein estimating the motion of the object comprises: generating a coarse estimate of the motion based at least in part on the mask in the first frame and the corresponding matched mask in the second frame; and refining the coarse estimate using a second machine learning algorithm, wherein the second machine learning algorithm takes as input the first frame, the second frame, and the coarse estimate, and the second machine learning algorithm is trained to predict a motion difference between the coarse motion vector and a ground truth motion vector. 11. The method of claim 10 , wherein the machine learning algorithm is trained to predict the motion difference at a plurality of resolutions, starting with the lowest resolution and predicting the motion difference at successively higher resolutions based on up-sampling the motion difference from the preceding resolution. 12. An image processing system, comprising: a memory, configured to store a set of frames comprising at least a first frame and a second frame; and a first segmentation block, configured to segment the first frame, to produce a plurality of first masks, each first mask being a pixel map identifying pixels belonging to a potential object-instance detected in the first frame; a first feature extraction block, configured to, for each potential object-instance detected in the first frame, extract from the first frame a first feature vector characterising the potential object-instance; a second segmentation block, configured to segment the second frame, to produce a plurality of second masks, each second mask being a pixel map identifying pixels belonging to a potential object-instance detected in the second frame; a second feature extraction block, configured to, for each potential object-instance detected in the second frame, extract from the second frame a second feature vector characterising the potential object-instance; and a matching block, configured to match at least one of the potential object-instances in the first frame with one of the potential object-instances in the second frame, based at least in part on the first feature vectors, the first masks, the second feature vectors and the second masks, wherein the matching block is configured to cluster the potential object-instances detected in the first and second frames, based at least in part on the first feature vectors and the second feature vectors, to generate clusters of potential object-instances. 13. The image processing system of claim 12 , wherein the first and second segmentation blocks are the same segmentation block, and/or the first and second feature extraction blocks are the same feature extraction block. 14. The image processing system of claim 12 , further comprising a motion estimation block, configured to estimate the motion of objects matched by the matching block. 15. The image processing system of claim 12 , wherein the matching block is further configured to, for each cluster in each frame: evaluate a distance between the potential object-instances in the cluster in that frame; and split the cluster into multiple clusters based on a result of the evaluating. 16. The image processing system of claim 12 , wherein the matching block is configured to, for each cluster in each frame, select a single object-instance from among the potential object-instances of that cluster, and to match one of the single object-instances in the first frame with a single object-instance in the second frame. 17. The image processing system of claim 12 wherein the masks and feature vectors are generated by a first machine learning algorithm. 18. A non-transitory computer readable storage medium having stored thereon computer readable code configured to cause to be performed, when the code is run, a method of analyzing one or more objects in a set of frames comprising at least a first frame and a second frame, the method comprising: segmenting the first frame, to produce a plurality of first masks, each first mask being a pixel map identifying pixels belonging to a potential object-instance detected in the first frame; for each potential object-instance detected in the first frame, extracting from the first frame a first feature vector characterising the pote
Clustering techniques · CPC title
Matching criteria, e.g. proximity measures · CPC title
Training; Learning · CPC title
involving reference images or patches · CPC title
for motion estimation over a hierarchy of resolutions (multi-resolution motion estimation or hierarchical motion estimation for coding, decoding, compressing or decompressing digital video signals H04N19/53) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.