Frame Level And Video Level Text Detection In Video
US-2020349381-A1 · Nov 5, 2020 · US
US11216684B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11216684-B1 |
| Application number | US-202016781456-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 4, 2020 |
| Priority date | Feb 4, 2020 |
| Publication date | Jan 4, 2022 |
| Grant date | Jan 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described for detecting and replacing burned-in subtitles in image and video content.
Opening claim text (preview).
What is claimed is: 1. A method for detecting and removing burned-in subtitles from video content, comprising: obtaining a video, wherein the video comprises a motion picture or an episode of a television series, wherein the video includes an image frame having burned-in subtitles, and wherein the video has an accompanying audio track; detecting, within the image frame, an area of the image frame that bounds the burned-in subtitles; detecting, within the area of the image frame that bounds the burned-in subtitles, first pixels associated with the burned-in subtitles, wherein the subtitles were burned-into the image frame by removing second pixels associated with non-subtitle image content and inserting, in place of the second pixels, the first pixels associated with the burned-in subtitles, wherein detecting the first pixels includes: processing at least a portion of the audio track through a speech recognition algorithm; processing at least a portion of the image frame through an optical character recognition algorithm; and determining, based on a comparison of outputs of the speech recognition and optical character recognition algorithms, that the image frame contains subtitle text; determining, based at least on third pixels in the image frame associated with non-subtitle image content, approximations of the second pixels associated with non-subtitle image content; and outputting a version of the video in which the first pixels of the image frame have been replaced based on the approximations of the second pixels. 2. The method of claim 1 , wherein determining the approximations of the second pixels comprises performing interpolations based on the at least some of the third pixels. 3. The method of claim 1 , wherein a fraction of the third pixels lie within the detected area that bounds the burned-in subtitles and wherein determining the approximations of the second pixels comprises performing interpolations based the fraction of the third pixels that lie within the detected area. 4. The method of claim 1 , wherein detecting the area of the image frame that bounds the burned-in subtitles comprises calculating a confidence score indicative of the likelihood that the area includes subtitle-text instead of non-subtitle text and determining that the confidence score exceeds a threshold. 5. A method for removing burned-in subtitles from video content, comprising: obtaining a video, wherein the video includes an image frame having burned-in subtitles, and wherein the video has an accompanying audio track; detecting, within the image frame, first pixels associated with the burned-in subtitles wherein detecting the first pixels includes: processing at least a portion of the audio track through a speech recognition algorithm; processing at least a portion of the image frame through an optical character recognition algorithm; and determining, based on a comparison of outputs of the speech recognition and optical character recognition algorithms, that the image frame contains subtitle text; determining replacement pixels for the first pixels; and outputting a version of the video in which the first pixels of the image frame are replaced with the replacement pixels. 6. The method of claim 5 , wherein determining the replacement pixels comprises interpolation based at least on second pixels in the image frame. 7. The method of claim 5 , wherein the image frame comprises a first image frame, wherein the video includes a second image frame that precedes or follows the first image frame, and wherein determining the replacement pixels comprises analyzing image content in the first image frame. 8. The method of claim 5 , further comprising calculating a confidence score indicative of the likelihood that an area of the image frame corresponding to the first pixels includes subtitle-text instead of non-subtitle text and determining that the confidence score exceeds a threshold. 9. The method of claim 5 , wherein detecting the first pixels associated with the burned-in subtitles comprises processing the image frame through a neural network configured to detect burned-in subtitles. 10. A system, comprising one or more processors and memory configured to: obtain a video, wherein the video includes an image frame having burned-in subtitles, and wherein the video has an accompanying audio track; detect, within the image frame, first pixels associated with the burned-in subtitles wherein the one or more processors and memory are configured to detect the first pixels by: processing at least a portion of the audio track through a speech recognition algorithm; processing at least a portion of the image frame through an optical character recognition algorithm; and determining, based on a comparison of outputs of the speech recognition and optical character recognition algorithms, that the image frame contains subtitle text; determine replacement pixels for the first pixels; and output a version of the video in which the first pixels of the image frame are replaced with the replacement pixels. 11. The system of claim 10 , wherein the processors and memory are configured to determine the replacement pixels by interpolation based at least on second pixels in the image frame. 12. The system of claim 10 , wherein the image frame comprises a first image frame, wherein the video includes a second image frame that precedes or follows the first image frame, and wherein the processors and memory are configured to determine the replacement pixels by analyzing image content in the first image frame. 13. The system of claim 10 , wherein the processors and memory are further configured to calculate a confidence score indicative of the likelihood that an area of the image frame corresponding to the first pixels includes subtitle-text instead of non-subtitle text and determining that the confidence score exceeds a threshold. 14. The system of claim 10 , wherein the processors and memory are configured to detect the first pixels associated with the burned-in subtitles by processing the image frame through a neural network configured to detect burned-in subtitles.
Segmentation of character regions · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
using neural networks · CPC title
Overlay text, e.g. embedded captions in a TV programme · CPC title
Video; Image sequence · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.