Text detection in video
US-9036083-B1 · May 19, 2015 · US
US9876982B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9876982-B2 |
| Application number | US-201514694719-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 23, 2015 |
| Priority date | May 28, 2014 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques of detecting text in video are disclosed. In some embodiments, a portion of video content can be identified as having text. Text within the identified portion of the video content can be identified. A category for the identified text can be determined. In some embodiments, a determination is made as to whether the video content satisfies at least one predetermined condition, and the portion of video content is identified as having text in response to a determination that the video content satisfies the predetermined condition(s). In some embodiments, the predetermined condition(s) comprises at least one of a minimum level of clarity, a minimum level of contrast, and a minimum level of content stability across multiple frames. In some embodiments, additional information corresponding to the video content is determined based on the identified text and the determined category.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: identifying, by a machine having a memory and at least one processor, a portion of video content as having text, the identifying the portion of the video content comprising: performing a connected component analysis on a frame of the video content to detect connected components within the frame; merging the connected components into a plurality of text lines; refining the plurality of text lines using horizontal and vertical projections in order to remove one or more text lines from the plurality of text lines; filtering out at least one of the plurality of text lines based on a size of the at least one of the plurality of text lines to form a filtered set of text lines; binarizing the filtered set of text lines formed by the filtering out of the at least one of the plurality of text lines; and filtering out at least one of the text lines from the binarized filtered set of text lines based on at least one of a shape of components in the at least one of the text lines and a position of components in the at least one of the text lines to form the portion of the video content having text identifying the text within the identified portion of the video content; determining a category for the identified text; determining additional information corresponding to the video content based on the identified text and the determined category; and causing a software application on a media content device to perform a function using the additional information, the function corresponding to the determined category. 2. The method of claim 1 , wherein the additional information comprises a uniform resource locator (URL), and causing the software application on the media content device to perform the function comprises causing the URL to be loaded on a browser on the media content device. 3. The method of claim 1 , wherein the additional information comprises a phone number, and causing the software application on the media content device to perform the function comprises causing the media content device to provide a prompt to call the phone number. 4. The method of claim 1 , further comprising causing the additional information to be displayed on the media content device. 5. The method of claim 1 , further comprising storing the additional information in association with the video content or in association with an identified viewer of the video content. 6. The method of claim 1 , wherein the additional information comprises at least one of an identification of a user account and a metadata tag. 7. The method of claim 1 , wherein the media content device comprises one of a television, a laptop computer, a desktop computer, a tablet computer, and a smartphone. 8. The method of claim 1 , further comprising storing the identified text in association with the video content or in association with an identified viewer of the video content. 9. The method of claim 1 , wherein identifying the portion of the video content having text further comprises: converting a frame of the video content to grayscale; performing edge detection on the frame; performing dilation on the frame to connect vertical edges within the frame; and binarizing the frame. 10. The method of claim 1 , wherein identifying text within the identified portion of the video content comprises performing optical character recognition on the identified portion of the video content. 11. The method of claim 1 , wherein determining the category for the identified text comprises: parsing the identified text to determine a plurality of segments of the identified text; and determining the category based on a stored association between at least one of the plurality of segments and the category. 12. The method of claim 1 , wherein the video content comprises a portion of a television program, a non-episodic movie, a webisode, user-generated content for a video-sharing website, or a commercial. 13. The method of claim 1 , wherein the text comprises alphanumeric characters. 14. A system comprising: a machine having a memory and at least one processor; and at least one module on the machine, the at least one module being configured to perform operations comprising: identifying a portion of the video content as having text, the identifying the portion of the video content comprising: performing a connected component analysis on a frame of the video content to detect connected components within the frame; merging the connected components into a plurality of text lines; refining the plurality of text lines using horizontal and vertical projections in order to remove one or more text lines from the plurality of text lines; filtering out at least one of the plurality of text lines based on a size of the at least one of the plurality of text lines; binarizing the filtered set of text lines formed by the filtering out of the at least one of the plurality of text lines; and filtering out at least one of the text lines from the binarized filtered set of text lines based on at least one of a shape of components in the at least one of the text lines and a position of components in the at least one of the text lines to form the portion of the video content having text; identifying the text within the identified portion of the video content; determining a category for the identified text; determining additional information corresponding to the video content based on the identified text and the determined category; and causing a software application on a media content device to perform a function using the additional information, the function corresponding to the determined category. 15. The system of claim 14 , wherein the additional information comprises one of a uniform resource locator (URL) and a phone number, and causing the software application on the media content device to perform the function comprises one of causing the URL to be loaded on a browser on the media content device and causing the media content to provide a prompt to call the phone number. 16. A non-transitory machine-readable storage device, tangibly embodying a set of instructions that, when executed by at least one processor, causes the at least one processor to perform a set of operations comprising: identifying a portion of the video content as having text, the identifying the portion of the video content comprising: performing a connected component analysis on a frame of the video content to detect connected components within the frame; merging the connected components into a plurality of text lines; refining the plurality of text lines using horizontal and vertical projections in order to remove one or more text lines from the plurality of text lines; filtering out at least one of the plurality of text lines based on a size of the at least one of the plurality of text lines to form a filtered set of text lines; binarizing the filtered set of text lines formed by the filtering out of the at least one of the plurality of text lines; and filtering out at least one of the text lines from the binarized filtered set of text lines based on at least one of a shape of components in the at least one of the text lines and a position of components in the at least one of the text lines to form the portion of the video content having text; identifying the text within the identified portion of the video content; determining a category for the identified text; determining additional information corresponding to the video content based on the identified text and the determined category; and causing a softwar
communicating with other users, e.g. chatting {(arrangements for providing for computer conferences, e.g. chat rooms, to substation in data switching networks H04L12/1813; distributed application using peer-to-peer [P2P] networks H04L67/104)} · CPC title
for displaying additional information (H04N5/50 takes precedence) · CPC title
involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream (arrangements characterised by components specially adapted for monitoring, identification or recognition of video in broadcast systems H04H60/59) · CPC title
for displaying messages, e.g. warnings, reminders (arrangements for providing short real-time information to substation in data switching networks H04L12/1895) · CPC title
by using a URL (processing chained hypermedia data for information retrieval G06F16/94; information retrieval from the Internet by using URLs G06F16/955; URL in broadcast information H04H20/93; Web-based protocols H04L67/02) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.