Using audio cues to improve object retrieval in video
US-10108617-B2 · Oct 23, 2018 · US
US11417079B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11417079-B2 |
| Application number | US-202016928455-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2020 |
| Priority date | Jul 14, 2020 |
| Publication date | Aug 16, 2022 |
| Grant date | Aug 16, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an approach for guiding a visually impaired user to position a mobile device appropriately in relation to a screen so that dynamic information on the screen can be reliably extracted and conveyed to the visually impaired user, a processor receives an image captured by a camera of a mobile device. A processor performs object recognition on the image to identify a digital screen and a location of the digital screen in the image. A processor retrieves a template of the digital screen. A processor performs angle-sensitive optical character recognition (OCR) on the location of the digital screen in the image. Responsive to a processor determining text on the digital screen can be extracted, a processor conveys the text to a user. Responsive to a processor determining text on the digital screen cannot be extracted, a processor guides the user to re-orient the mobile device to capture a better image.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processors, an image captured by a camera of a user mobile device; performing, by the one or more processors, object recognition on the image to identify a digital screen and a location of the digital screen in the image, wherein the digital screen is identified to be of a known type, brand, or model of digital screen; retrieving, by the one or more processors, a template of the digital screen based on the known type, brand, or model of digital screen; performing, by the one or more processors, angle-sensitive optical character recognition (OCR) on the location of the digital screen in the image to detect rectangular regions in the image that contain text and calculate an angle of the rectangular regions relative to a horizontal axis; determining, by the one or more processors, whether, within the image, the text on the digital screen can be extracted based on whether the rectangular regions overlap within a pre-defined threshold with expected text locations based on the template; and responsive to determining the text cannot be extracted, guiding, by the one or more processors, a user of the user mobile device to re-orient at least one of a position and a rotation of the user mobile device based on the angle calculated using the angle-sensitive OCR to capture another image. 2. The computer-implemented method of claim 1 , further comprising: responsive to determining the text can be extracted, audibly conveying, by the one or more processors, the text to a user of the user mobile device using text-to-speech. 3. The computer-implemented method of claim 1 , wherein performing angle-sensitive OCR on the location of the digital screen in the image comprises: detecting, by the one or more processors, rectangular regions in the image that contain text; calculating, by the one or more processors, angles of the rectangular regions relative to a horizontal axis; and converting, by the one or more processors, text detected in these rectangular regions from image data to text data. 4. The computer-implemented method of claim 3 , further comprising: wherein the template indicates a set of locations of where text would be located on the digital screen; and comparing, by the one or more processors, locations of the rectangular regions in the image to the set of locations of where text would be located on the digital screen based on the template. 5. The computer-implemented method of claim 4 , wherein determining whether the text on the digital screen can be extracted is based on comparing the locations of the rectangular regions in the image to the set of locations of where text would be located on the digital screen based on the template. 6. A computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: program instructions to receive an image captured by a camera of a user mobile device; program instructions to perform object recognition on the image to identify a digital screen and a location of the digital screen in the image, wherein the digital screen is identified to be of a known type, brand, or model of digital screen; program instructions to retrieve a template of the digital screen based on the known type, brand, or model of digital screen; program instructions to perform angle-sensitive optical character recognition (OCR) on the location of the digital screen in the image to detect rectangular regions in the image that contain text and calculate an angle of the rectangular regions relative to a horizontal axis; program instructions to determine whether, within the image, text on the digital screen can be extracted based on whether the rectangular regions overlap within a pre-defined threshold with expected text locations based on the template; and responsive to determining the text cannot be extracted, program instructions to guide a user of the user mobile device to re-orient at least one of a position and a rotation of the user mobile device based on the angle calculated using the angle-sensitive OCR to capture another image. 7. The computer program product of claim 6 , further comprising: responsive to determining the text can be extracted, program instructions to audibly convey the text to a user of the user mobile device using text-to-speech. 8. The computer program product of claim 6 , wherein the program instructions to perform angle-sensitive OCR on the location of the digital screen in the image comprise: program instructions to detect rectangular regions in the image that contain text; program instructions to calculate angles of the rectangular regions relative to a horizontal axis; and program instructions to convert text detected in these rectangular regions from image data to text data. 9. The computer program product of claim 8 , further comprising: wherein the template indicates a set of locations of where text would be located on the digital screen; and program instructions to compare locations of the rectangular regions in the image to the set of locations of where text would be located on the digital screen based on the template. 10. The computer program product of claim 9 , wherein the program instructions to determine whether the text on the digital screen can be extracted is based on comparing the locations of the rectangular regions in the image to the set of locations of where text would be located on the digital screen based on the template. 11. A computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions stored on the computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising: program instructions to receive an image captured by a camera of a user mobile device; program instructions to perform object recognition on the image to identify a digital screen and a location of the digital screen in the image, wherein the digital screen is identified to be of a known type, brand, or model of digital screen; program instructions to retrieve a template of the digital screen based on the known type, brand, or model of digital screen; program instructions to perform angle-sensitive optical character recognition (OCR) on the location of the digital screen in the image to detect rectangular regions in the image that contain text and calculate an angle of the rectangular regions relative to a horizontal axis; program instructions to determine whether, within the image, text on the digital screen can be extracted based on whether the rectangular regions overlap within a pre-defined threshold with expected text locations based on the template; and responsive to determining the text cannot be extracted, program instructions to guide a user of the user mobile device to re-orient at least one of a position and a rotation of the user mobile device based on the angle calculated using the angle-sensitive OCR to capture another image. 12. The computer system of claim 11 , further comprising: responsive to determining the text can be extracted, program instructions to audibly convey the text to a user of the user mobile device using text-to-speech. 13. The computer system of claim 11 , wherein the program instructions to perform angle-sensitive OCR on the location of the digital screen in the image comprise: program instructions to detect rectangular regions in the image that contain text; program instructions to calculate angles of the rectangul
Matching criteria, e.g. proximity measures · CPC title
Text, e.g. of license plates, overlay texts or captions on TV images · CPC title
Teaching or communicating with blind persons (G09B21/02 - G09B21/06 take precedence) · CPC title
Speech synthesis; Text to speech systems · CPC title
Inclination or skew detection or correction of characters or of image to be recognised · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.