Merging optical character recognized text from frames of image data

US9659224B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9659224-B1
Application numberUS-201414230471-A
CountryUS
Kind codeB1
Filing dateMar 31, 2014
Priority dateMar 31, 2014
Publication dateMay 23, 2017
Grant dateMay 23, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are techniques for merging optical character recognized (OCR'd) text from frames of image data. In some implementations, a device sends frames of image data to a server, where each frame includes at least a portion of a captured textual item. The server performs optical character recognition (OCR) on the image data of each frame. When OCR'd text from respective frames is returned to the device from the server, the device can perform matching operations on the text, for instance, using bounding boxes and/or edit distance processing. The device can merge any identified matches of OCR'd text from different frames. The device can then display the merged text with any corrections.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium storing instructions executable by one or more processors of a device to cause a method to be performed for merging recognized text from a plurality of frames of image data, the method comprising: sending, from the device to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including a first portion of a scene having at least a first captured textual item, the first captured textual item within a first bounding box corresponding to a region of the first frame; receiving, from the one or more servers, first recognized text corresponding to the first captured textual item, the one or more servers having generated the first recognized text using optical character recognition; displaying the first recognized text on a display; sending, from the device to the one or more servers, at least a portion of a second frame of image data including a second portion of the scene having at least a second captured textual item, the second captured textual item within a second bounding box corresponding to a region of the second frame; receiving, from the one or more servers, second recognized text corresponding to the second captured textual item; determining an edit distance between the first recognized text and the second recognized text; determining that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on at least one of a type of the first recognized text or of the second recognized text; determining an overlap of the first bounding box relative with the second bounding box; determining that the first captured textual item matches the second captured textual item based at least in part on the edit distance and on the overlap; generating merged text based at least in part on the first recognized text and the second recognized text; and displaying the merged text on the display. 2. The non-transitory computer-readable storage medium of claim 1 , wherein merging the first recognized text and the second recognized text to produce merged text comprises selecting a first portion of the first recognized text based at least in part on a first confidence level and selecting a second portion of the second recognized text based at least in part on a second confidence level. 3. The non-transitory computer-readable storage medium of claim 1 , wherein determining the transformation of the first frame of image data to the second frame of image data comprises determining a homography corresponding to the first and second frames of image data. 4. The non-transitory computer-readable storage medium of claim 1 , wherein displaying the merged text comprises removing the first text and indicating differences between the merged text and the first text. 5. A device comprising: one or more processors operable to: send, to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including at least a first captured textual item within a first bounding box corresponding to a region of the first frame; receive, from the one or more servers, first recognized text corresponding to the first captured textual item; send, to the one or more servers, at least a portion of a second frame of image data including at least a second captured textual item within a second bounding box corresponding to a region of the second frame; receive, from the one or more servers, second recognized text corresponding to the second captured textual item; compare first characters of the first recognized text with second characters of the second recognized text, wherein comparing the first characters with the second characters includes: determine an edit distance between the first recognized text and the second recognized text, and determine that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on a type of at least one of the first recognized text or of the second recognized text; determine an overlap of the first bounding box relative to the second bounding box; determine that the first captured textual item matches the second captured textual item based at least in part on (i) the comparison of characters of the first recognized text with characters of the second recognized text and (ii) the overlap; generate merged text based at least in part on the first recognized text and the second recognized text; and display, on a display, the merged text. 6. The device of claim 5 , wherein displaying the merged text on the display at the device comprises: replacing previously displayed text with the merged text. 7. The device of claim 6 , wherein the previously displayed text is replaced with the merged text when an update condition is satisfied, the update condition being satisfied when: the merged text includes one or more differences from the previously displayed text, and the merged text has a confidence level greater than a confidence level of the previously displayed text. 8. The device of claim 7 , wherein the confidence level of the merged text is determined by one or more of: an optical character recognition engine processing the text or a semantic analysis of the text. 9. The device of claim 5 , wherein displaying the merged text on the display at the device comprises: removing the previously displayed text and indicating differences between the merged text and the previously displayed text. 10. The device of claim 5 , wherein at least a portion of the displayed text is actionable to cause a computing event to occur. 11. The device of claim 5 , further comprising: a camera operable to capture the first frame and the second frame. 12. The device of claim 5 , wherein generating the merged text comprises selecting a first portion of the first recognized text based at least in part on a first confidence level and selecting a second portion of the second recognized text based at least in part on a second confidence level. 13. A method comprising: sending, from a device to one or more servers in communication with the device via a network, at least a portion of a first frame of image data including at least a first captured textual item within a first bounding box corresponding to a region of the first frame; receiving, from the one or more servers, first recognized text corresponding to the first captured textual item; sending, to the one or more servers, at least a portion of a second frame of image data including at least a second captured textual item within a second bounding box corresponding to a region of the second frame; receiving, from the one or more servers, second recognized text corresponding to the second captured textual item; comparing first characters of the first recognized text with second characters of the second recognized text, wherein comparing the first characters with the second characters includes: determining an edit distance between the first recognized text and the second recognized text, and determining that the edit distance satisfies an edit distance threshold, wherein the edit distance threshold depends on at least one of a type of the first recognized text or of the second recognized text; determining an overlap of the first bounding box relative with the second bounding box; determining that the first captured textual item matches the second captured textual item based at least in part on the comparison of characters of the first recognized text with characters of the second recognized text and on the overlap; generating merged text based at leas

Assignees

Inventors

Classifications

  • G06V30/262Primary

    using context analysis, e.g. lexical, syntactic or semantic context · CPC title

  • of classification results, e.g. where the classifiers operate on the same input data · CPC title

  • of printed characters having additional code marks or containing code marks · CPC title

  • of classification results, e.g. of results related to same input data · CPC title

  • Character recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9659224B1 cover?
Disclosed are techniques for merging optical character recognized (OCR'd) text from frames of image data. In some implementations, a device sends frames of image data to a server, where each frame includes at least a portion of a captured textual item. The server performs optical character recognition (OCR) on the image data of each frame. When OCR'd text from respective frames is returned to t…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06V30/262. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).