What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 05 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Frame Level And Video Level Text Detection In Video

US2020349381A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020349381-A1
Application number	US-201916399823-A
Country	US
Kind code	A1
Filing date	Apr 30, 2019
Priority date	Apr 30, 2019
Publication date	Nov 5, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some embodiments, a method detects a first set of frames in a video that include lines of text, the detecting performed at a frame level on each individual frame. A first representation is generated from the first set of frames and a second representation is generated from the first set of frames. The method filters the first representation based on a number of lines of text within a space in the space dimension to select a second set of frames and filters the second representation based on a number of frames within time intervals in the time dimension to select a third set of frames. Frames in both the second set of frames and the third set of frames are analyzed to determine whether the lines of text in both the second set of frames and the third set of frames are burned-in subtitles.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: detecting, by a computing device, a first set of frames in a video that include lines of text, the detecting performed at a frame level on each individual frame; generating, by the computing device, a first representation from at least a first portion of the first set of frames and a second representation from at least a second portion of the first set of frames, the first representation generated based on a space dimension and the second representation generated based on a time dimension; filtering, by the computing device, the first representation based on a number of lines of text within a space in the space dimension to select a second set of frames; filtering, by the computing device, the second representation based on a number of frames within one or more time intervals in a plurality of time intervals in the time dimension to select a third set of frames; and analyzing, by the computing device, frames in both the second set of frames and the third set of frames to determine whether the lines of text in both the second set of frames and the third set of frames are burned-in subtitles. 2 . The method of claim 1 , wherein detecting the first set of frames comprises analyzing each frame for lines of text without using information from other frames. 3 . The method of claim 1 , wherein generating the first representation and the second representation comprises analyzing multiple frames from the first set of frames. 4 . The method of claim 1 , wherein detecting the first set of frames comprises: predicting whether lines of characters are present in a frame; predicting character positions from the lines of text; and connecting the character positions together to form lines of text. 5 . The method of claim 1 , wherein filtering the first representation based on the number of lines of text within the space in the space dimension comprises: summarizing the number of lines of text found in different spaces in the at least the first portion of the first set of frames; and selecting the second set of frames based on the number of lines of text found in the space. 6 . The method of claim 1 , wherein filtering the first representation based on the number of lines of text within the space in the space dimension comprises: generating a structure that includes a value for a number of text lines found in different spaces in the at least the first portion of the first set of frames; and selecting the second set of frames based on values for the number of text lines found in the space. 7 . The method of claim 1 , wherein filtering the second representation based on the number of frames within the one or more time intervals in the plurality of time intervals in the time dimension comprises summarizing a number of lines of text in the at least the second portion of the first set of frames that are found in different time intervals; and selecting the third set of frames based on the number of lines of text in the one or more time intervals. 8 . The method of claim 1 , wherein filtering the second representation based on the number of frames within the time interval in the plurality of time intervals in the time dimension comprises generating a structure that includes a value for a number of lines of text in the at least the first portion of the first set of frames that are found in different time intervals in the plurality of time intervals; and selecting the third set of frames based on the number of lines of text in the one or more time intervals. 9 . The method of claim 1 , wherein filtering the second representation based on the number of frames within the one or more time intervals in the plurality of time intervals in the time dimension comprises summarizing a number of frames in the at least the first portion of the first set of frames that are found in different time intervals in the plurality of time intervals; and selecting the third set of frames based on the number of frames in the one or more time intervals. 10 . The method of claim 1 , wherein analyzing the frames in both the second set of frames and the third set of frames to determine whether the lines of text in both the second set of frames and the third set of frames are burned-in subtitles comprises: comparing a number of the frames that are included in both the second set of frames and the third set of frames to a threshold; determining that the video includes burned-in subtitles when the number of frames is above the threshold; and determining that the video does not include burned-in subtitles when the number of frames is above the threshold. 11 . The method of claim 1 , further comprising: when frames that are included in both the second set of frames and the third set of frames are determined to include burned-in subtitles, analyzing the burned-in subtitles to determine a language type of the burned-in subtitles. 12 . The method of claim 11 , wherein analyzing the burned-in subtitles comprises: analyzing the burned-in subtitles for a character type; and when the character type is associated with a language in a plurality of languages, outputting the language associated with the character type. 13 . The method of claim 12 , wherein analyzing the burned-in subtitles comprises: when the character type is associated with multiple languages, analyzing words of the burned-in subtitles to select a language within the multiple languages. 14 . The method of claim 1 , wherein generating the first representation from at least the first portion of the first set of frames and the second representation from at least the second portion of the first set of frames comprises: generating the first representation from the first set of frames; and generating the second representation from the second set of frames. 15 . The method of claim 1 , wherein generating the first representation from at least the first portion of the first set of frames and the second representation from at least the second portion of the first set of frames comprises: generating the second representation from the first set of frames; and generating the first representation from the second set of frames. 16 . A non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be operable for: detecting a first set of frames in a video that include lines of text, the detecting performed at a frame level on each individual frame; generating a first representation from at least a first portion of the first set of frames and a second representation from at least a second portion of the first set of frames, the first representation generated based on a space dimension and the second representation generated based on a time dimension; filtering the first representation based on a number of lines of text within a space in the space dimension to select a second set of frames; filtering the second representation based on a number of frames within one or more time intervals in a plurality of time intervals in the time dimension to select a third set of frames; and analyzing frames in both the second set of frames and the third set of frames to determine whether the lines of text in both the second set of frames and the third set of frames are burned-in subtitles. 17 . The non-transitory computer-readable storage medium of claim 16 , wherein: detecting the first set of frames comprises analyzing each frame for lines of text without using information from other frames, and generating the firs

Assignees

Hulu Llc

Inventors

Classifications

G06V30/287
of Kanji, Hiragana or Katakana characters · CPC title
G06V20/635
Overlay text, e.g. embedded captions in a TV programme · CPC title
G06V30/19173
Classification techniques · CPC title
G06V10/82Primary
using neural networks · CPC title
G06V30/153
using recognition of characters or words · CPC title

Patent family

Related publications grouped by family.

View patent family 73017566

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020349381A1 cover?: In some embodiments, a method detects a first set of frames in a video that include lines of text, the detecting performed at a frame level on each individual frame. A first representation is generated from the first set of frames and a second representation is generated from the first set of frames. The method filters the first representation based on a number of lines of text within a space i…
Who is the assignee on this patent?: Hulu Llc
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 05 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).