Recognizing text in image data
US-10095925-B1 · Oct 9, 2018 · US
US9959693B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9959693-B2 |
| Application number | US-201514814655-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 31, 2015 |
| Priority date | Jul 31, 2015 |
| Publication date | May 1, 2018 |
| Grant date | May 1, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In a method for identifying desynchronization between representations of data obtained from an ordered plurality of documents, a processor can receive ordered first and second pluralities of data strings obtained from the respective plurality of documents; compare each data string in the first plurality to the corresponding data string in the second plurality and to each data string sequentially before or sequentially after the corresponding data string in the second plurality; based on the comparison, designate each data string in the first plurality as being one of synchronized, leading, or trailing; identify a continuous sequence of N data strings in the first plurality that all have a designation of leading or all have a designation of trailing, where N equals or exceeds a specified sequence threshold; and generate a single error signal that identifies all N of the data strings in the continuous sequence as being desynchronized.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving ordered first and second pluralities of data strings obtained from an ordered plurality of documents, the ordered first plurality of data strings obtained from an optical scanner, the ordered second plurality of data strings obtained from a magnetic scanner; comparing each data string in the first plurality to a corresponding data string in the second plurality and to each data string sequentially before or sequentially after the corresponding data string in the second plurality; based on the comparing, designating each data string in the first plurality as being one of unknown, synchronized, leading, or trailing; when a continuous sequence of N data strings in the first plurality have a same designation as leading or trailing, generating an error signal associated with the first and second pluralities identifying all N of the data strings in the continuous sequence as being desynchronized; and making only documents associated with data strings in the first plurality that are synchronized available for viewing. 2. The method of claim 1 , further comprising: receiving an ordered plurality of images of the respective ordered plurality of documents; and performing optical character recognition on the ordered plurality of images to generate the ordered first plurality of data strings. 3. The method of claim 2 , further comprising: magnetically reading a magnetic ink recognition line from each of the plurality of documents to generate the second plurality of data strings; and imaging at least the magnetic ink recognition line from each of the plurality of documents to form the plurality of images. 4. The method of claim 1 , wherein designating each data string in the first plurality as being one of synchronized, leading, or trailing comprises: designating each data string in the first plurality as being one of: synchronized, when the data string in the first plurality matches the corresponding data string in the second plurality; leading, when the data string in the first plurality matches a data string sequentially before the corresponding data string in the second plurality; or trailing, when the data string in the first plurality matches a data string sequentially after the corresponding data string in the second plurality. 5. The method of claim 1 , wherein comparing each data string in the first plurality to the corresponding data string in the second plurality and to each data string sequentially before or sequentially after the corresponding data string in the second plurality comprises: for each data string in the first plurality: calculating respective Levenshtein distances between said data string in the first plurality and the corresponding data string in the second plurality, and between said first data string in the first plurality and each data string sequentially before or sequentially after the corresponding data string in the second plurality; selecting the lowest of the calculated Levenshtein distances; selecting the data string of the second plurality corresponding to the selected Levenshtein distance; and designating the selected data string of the second plurality as matching said data string of the first plurality. 6. The method of claim 5 , further comprising: designating the selected data string of the second plurality as being unknown if the lowest of the calculated Levenshtein distances equals or exceeds a specified Levenshtein distance threshold. 7. The method of claim 6 , further comprising: identifying a continuous sequence of N data strings in the first plurality that all have a designation of leading or unknown, or all have a designation of trailing or unknown, where N equals or exceeds the specified sequence threshold. 8. The method of claim 1 , wherein generating an error signal associated with the first and second pluralities identifying all N of the data strings in the continuous sequence as being desynchronized comprises: generating a single error signal associated with the first and second pluralities identifying all N of the data strings in the continuous sequence as being desynchronized. 9. The method of claim 1 , wherein N equals or exceeds a specified sequence threshold. 10. A method, comprising: imaging each of an ordered plurality of documents to form a respective ordered plurality of images; performing optical character recognition on the ordered plurality of images to generate an ordered first plurality of data strings; magnetically reading a magnetic ink recognition line from each of the plurality of documents to generate an ordered second plurality of data strings; comparing each data string in the first plurality to a corresponding data string in the second plurality and to each data string sequentially before or sequentially after the corresponding data string in the second plurality; based on the comparing, designating each data string in the first plurality as being one of unknown, synchronized, leading, or trailing; when a continuous sequence of N data strings in the first plurality have a same designation as leading or trailing, generating an error signal associated with the first and second pluralities identifying all N of the data strings in the continuous sequence as being desynchronized, where N equals or exceeds a specified sequence threshold; and making only images associated with data strings in the first plurality that synchronized available for viewing. 11. The method of claim 10 , wherein designating each data string in the first plurality as being one of synchronized, leading, or trailing comprises: designating each data string in the first plurality as being one of: synchronized, if the data string in the first plurality matches the corresponding data string in the second plurality; leading, if the data string in the first plurality matches a data string sequentially before the corresponding data string in the second plurality; or trailing, if the data string in the first plurality matches a data string sequentially after the corresponding data string in the second plurality. 12. The method of claim 10 , wherein comparing each data string in the first plurality to the corresponding data string in the second plurality and to each data string sequentially before or sequentially after the corresponding data string in the second plurality comprises: for each data string in the first plurality: calculating respective Levenshtein distances between said data string in the first plurality and the corresponding data string in the second plurality, and between said first data string in the first plurality and each data string sequentially before or sequentially after the corresponding data string in the second plurality; selecting the lowest of the calculated Levenshtein distances; selecting the data string of the second plurality corresponding to the selected Levenshtein distance; and designating the selected data string of the second plurality as matching said data string of the first plurality. 13. The method of claim 12 , further comprising: designating the selected data string of the second plurality as being unknown if the lowest of the calculated Levenshtein distances equals or exceeds a specified Levenshtein distance threshold. 14. The method of claim 13 , further comprising: identifying a continuous sequence of N data strings in the first plurality that all have a designation of leading or unknown, or all have a designation of trailing or unknown, where N equals or exceeds the specified sequence threshold. 15. The method of claim 10 , wherein gen
Detection or correction of errors, e.g. by rescanning the pattern · CPC title
using checkcodes, e.g. coded numbers derived from serial number and denomination · CPC title
Matching criteria, e.g. proximity measures · CPC title
Recognition of characters printed with magnetic ink (G06V30/2247 takes precedence) · CPC title
Character recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.