Techniques for machine language translation of text from an image based on non-textual context information from the image
US-2016371256-A1 · Dec 22, 2016 · US
US9659224B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9659224-B1 |
| Application number | US-201414230471-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 31, 2014 |
| Priority date | Mar 31, 2014 |
| Publication date | May 23, 2017 |
| Grant date | May 23, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are techniques for merging optical character recognized (OCR'd) text from frames of image data. In some implementations, a device sends frames of image data to a server, where each frame includes at least a portion of a captured textual item. The server performs optical character recognition (OCR) on the image data of each frame. When OCR'd text from respective frames is returned to the device from the server, the device can perform matching operations on the text, for instance, using bounding boxes and/or edit distance processing. The device can merge any identified matches of OCR'd text from different frames. The device can then display the merged text with any corrections.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable storage medium storing instructions executable by one or more processors of a device to cause a method to be performed for merging recognized text from a plurality of frames of image data, the method comprising: sending, from the device to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including a first portion of a scene having at least a first captured textual item, the first captured textual item within a first bounding box corresponding to a region of the first frame; receiving, from the one or more servers, first recognized text corresponding to the first captured textual item, the one or more servers having generated the first recognized text using optical character recognition; displaying the first recognized text on a display; sending, from the device to the one or more servers, at least a portion of a second frame of image data including a second portion of the scene having at least a second captured textual item, the second captured textual item within a second bounding box corresponding to a region of the second frame; receiving, from the one or more servers, second recognized text corresponding to the second captured textual item; determining an edit distance between the first recognized text and the second recognized text; determining that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on at least one of a type of the first recognized text or of the second recognized text; determining an overlap of the first bounding box relative with the second bounding box; determining that the first captured textual item matches the second captured textual item based at least in part on the edit distance and on the overlap; generating merged text based at least in part on the first recognized text and the second recognized text; and displaying the merged text on the display. 2. The non-transitory computer-readable storage medium of claim 1 , wherein merging the first recognized text and the second recognized text to produce merged text comprises selecting a first portion of the first recognized text based at least in part on a first confidence level and selecting a second portion of the second recognized text based at least in part on a second confidence level. 3. The non-transitory computer-readable storage medium of claim 1 , wherein determining the transformation of the first frame of image data to the second frame of image data comprises determining a homography corresponding to the first and second frames of image data. 4. The non-transitory computer-readable storage medium of claim 1 , wherein displaying the merged text comprises removing the first text and indicating differences between the merged text and the first text. 5. A device comprising: one or more processors operable to: send, to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including at least a first captured textual item within a first bounding box corresponding to a region of the first frame; receive, from the one or more servers, first recognized text corresponding to the first captured textual item; send, to the one or more servers, at least a portion of a second frame of image data including at least a second captured textual item within a second bounding box corresponding to a region of the second frame; receive, from the one or more servers, second recognized text corresponding to the second captured textual item; compare first characters of the first recognized text with second characters of the second recognized text, wherein comparing the first characters with the second characters includes: determine an edit distance between the first recognized text and the second recognized text, and determine that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on a type of at least one of the first recognized text or of the second recognized text; determine an overlap of the first bounding box relative to the second bounding box; determine that the first captured textual item matches the second captured textual item based at least in part on (i) the comparison of characters of the first recognized text with characters of the second recognized text and (ii) the overlap; generate merged text based at least in part on the first recognized text and the second recognized text; and display, on a display, the merged text. 6. The device of claim 5 , wherein displaying the merged text on the display at the device comprises: replacing previously displayed text with the merged text. 7. The device of claim 6 , wherein the previously displayed text is replaced with the merged text when an update condition is satisfied, the update condition being satisfied when: the merged text includes one or more differences from the previously displayed text, and the merged text has a confidence level greater than a confidence level of the previously displayed text. 8. The device of claim 7 , wherein the confidence level of the merged text is determined by one or more of: an optical character recognition engine processing the text or a semantic analysis of the text. 9. The device of claim 5 , wherein displaying the merged text on the display at the device comprises: removing the previously displayed text and indicating differences between the merged text and the previously displayed text. 10. The device of claim 5 , wherein at least a portion of the displayed text is actionable to cause a computing event to occur. 11. The device of claim 5 , further comprising: a camera operable to capture the first frame and the second frame. 12. The device of claim 5 , wherein generating the merged text comprises selecting a first portion of the first recognized text based at least in part on a first confidence level and selecting a second portion of the second recognized text based at least in part on a second confidence level. 13. A method comprising: sending, from a device to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including at least a first captured textual item within a first bounding box corresponding to a region of the first frame; receiving, from the one or more servers, first recognized text corresponding to the first captured textual item; sending, to the one or more servers, at least a portion of a second frame of image data including at least a second captured textual item within a second bounding box corresponding to a region of the second frame; receiving, from the one or more servers, second recognized text corresponding to the second captured textual item; comparing first characters of the first recognized text with second characters of the second recognized text, wherein comparing the first characters with the second characters includes: determining an edit distance between the first recognized text and the second recognized text, and determining that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on at least one of a type of the first recognized text or of the second recognized text; determining an overlap of the first bounding box relative with the second bounding box; determining that the first captured textual item matches the second captured textual item based at least in part on the comparison of characters of the first recognized text with characters of the second recognized text and on the overlap; generating merged text based at leas
using context analysis, e.g. lexical, syntactic or semantic context · CPC title
of classification results, e.g. where the classifiers operate on the same input data · CPC title
of printed characters having additional code marks or containing code marks · CPC title
of classification results, e.g. of results related to same input data · CPC title
Character recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.