Method of video stabilization using background subtraction
US-2019172184-A1 · Jun 6, 2019 · US
US10778939B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10778939-B2 |
| Application number | US-201715713596-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 22, 2017 |
| Priority date | Sep 22, 2017 |
| Publication date | Sep 15, 2020 |
| Grant date | Sep 15, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An effects application receives a video of a face and detects a bounding box for each frame indicating the location and size of the face in each frame. In one or more reference frames. The application uses an algorithm to determine locations of facial features in the frame. The application then normalizes the feature locations relative to the bounding box and saves the normalized feature locations. In other frames (e.g., target frames), the application obtains the bounding box and then predicts the locations of the facial features based on the size and location of the bounding box and the normalized feature locations calculated in the reference frame. The predicted locations can be made available to an augmented reality function that overlays graphics in a video stream based on face tracking in order to apply a desired effect to the video.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: receiving a video comprising a sequence of video frames; detecting a face of a subject in a target video frame of the sequence of video frames; obtaining a target bounding box for the detected face in the target video frame, the target bounding box having edges aligning with detected outermost pixels of the detected face; obtaining a reference bounding box for the detected face in a reference video frame of the video; applying a feature detection algorithm to detect location of facial features for the detected face in the reference video frame; determining a horizontal offset of a first location of a given facial feature from a vertical edge of the reference bounding box; determining a normalized first location as a ratio of the horizontal offset to a width of the reference bounding box; determining a vertical offset of a first location of the given facial feature from a horizontal edge of the reference bounding box; determining a normalized second location as a ratio of the vertical offset to a height of the reference bounding box; applying a transformation to the target bounding box to estimate locations of the facial features for the detected face in the target video frame, the transformation based on the normalized first and second locations; applying an effect to alter the target video frame based on the estimated locations of the facial features for the detected face in the target video frame to generate an altered video frame; and outputting the altered video frame. 2. The method of claim 1 , wherein applying the transformation comprises: determining a relative horizontal offset of a given facial feature within the target bounding box as a product of a width of the target bounding box and the normalized horizontal location; and offsetting the relative horizontal offset by a horizontal location of the target bounding box to determine an absolute horizontal position of the given facial feature; determining a relative vertical offset of the given facial feature within the target bounding box as a product of a height of the target bounding box and the normalized vertical location; and offsetting the relative vertical offset by a vertical location of the target bounding box to determine an absolute vertical position of the given facial feature. 3. The method of claim 1 , wherein applying the effect comprises: warping a mask image such that alignment points on the mask image align with the estimated locations of the facial features; and overlaying the mask image on the target frame. 4. The method of claim 3 , wherein applying the effect further comprises: predicting a rotation of the face based on a narrowing of a width of the target bounding box relative to a bounding box for prior video frame; and rotating the mask image based on the predicted rotation. 5. The method of claim 1 , wherein the facial features comprise eyes and a nose. 6. A non-transitory computer-readable storage medium storing instructions that when executed by a processor cause the processor to perform steps including: receiving a video comprising a sequence of video frames; detecting a face of a subject in a target video frame of the sequence of video frames; obtaining a target bounding box for the detected face in the target video frame, the target bounding box having edges aligning with detected outermost pixels of the detected face; obtaining a reference bounding box for the detected face in a reference video frame of the video; applying a feature detection algorithm to detect location of facial features for the detected face in the reference video frame; determining a horizontal offset of a first location of a given facial feature from a vertical edge of the reference bounding box; determining a normalized first location as a ratio of the horizontal offset to a width of the reference bounding box; determining a vertical offset of a first location of the given facial feature from a horizontal edge of the reference bounding box; determining a normalized second location as a ratio of the vertical offset to a height of the reference bounding box; applying a transformation to the target bounding box to estimate locations of the facial features for the detected face in the target video frame, the transformation based on the normalized first and second locations; applying an effect to alter the target video frame based on the estimated locations of the facial features for the detected face in the target video frame to generate an altered video frame; and outputting the altered video frame. 7. The non-transitory computer-readable storage medium of claim 6 , wherein applying the transformation comprises: determining a relative horizontal offset of a given facial feature within the target bounding box as a product of a width of the target bounding box and the normalized horizontal location; and offsetting the relative horizontal offset by a horizontal location of the target bounding box to determine an absolute horizontal position of the given facial feature; determining a relative vertical offset of the given facial feature within the target bounding box as a product of a height of the target bounding box and the normalized vertical location; and offsetting the relative vertical offset by a vertical location of the target bounding box to determine an absolute vertical position of the given facial feature. 8. The non-transitory computer-readable storage medium of claim 6 , wherein applying the effect comprises: warping a mask image such that alignment points on the mask image align with the estimated locations of the facial features; and overlaying the mask image on the target frame. 9. The non-transitory computer-readable storage medium of claim 8 , wherein applying the effect further comprises: predicting a rotation of the face based on a narrowing of a width of the target bounding box relative to a bounding box for prior video frame; and rotating the mask image based on the predicted rotation. 10. The non-transitory computer-readable storage medium of claim 6 , wherein the facial features comprise eyes and a nose. 11. A computer device comprising: a processor; and a non-transitory computer-readable storage medium storing instructions that when executed by a processor cause the processor to perform steps including: receiving a video comprising a sequence of video frames; detecting a face of a subject in a target video frame of the sequence of video frames; obtaining a target bounding box for the detected face in the target video frame, the target bounding box having edges aligning with detected outermost pixels of the detected face; obtaining a reference bounding box for the detected face in a reference video frame of the video; applying a feature detection algorithm to detect location of facial features for the detected face in the reference video frame; determining a horizontal offset of a first location of a given facial feature from a vertical edge of the reference bounding box; determining a normalized first location as a ratio of the horizontal offset to a width of the reference bounding box; determining a vertical offset of a first location of the given facial feature from a horizontal edge of the reference bounding box; determining a normalized second location as a ratio of the vertical offset to a height of the reference bounding box; applying a transformation to the target bounding box to estimate locations of the facial features for the detected face in the target video frame, the transformation based on the normalized first and second locations; applying an effect to alter the target video f
involving the mixing of the reproduced video signal with a non-recorded signal, e.g. a text signal · CPC title
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title
Holistic features and representations, i.e. based on the facial image taken as a whole · CPC title
in augmented reality scenes · CPC title
Creating or editing images; Combining images with text · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.