Methods and systems for detecting and recognizing text from images
US-2017004374-A1 · Jan 5, 2017 · US
US11715316B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11715316-B2 |
| Application number | US-202117542856-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 6, 2021 |
| Priority date | Sep 23, 2015 |
| Publication date | Aug 1, 2023 |
| Grant date | Aug 1, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.
Opening claim text (preview).
What is claimed is: 1. A method implemented by an electronic device having one or more processors for determining if a document is a text page, the method comprising: partitioning the document into a plurality of cells; scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells; classifying the snippets, using a neural network, to determine (i) a first set of cells classified as text and (ii) a second set of cells classified as non-text; determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells; and in response to a determination that (i) the total amount of text in the document is within a predetermined range and (ii) the first set of cells are aligned to one or more horizontal or vertical lines, determining that the document is a text page. 2. The method of claim 1 , further comprising: in response to a determination that (i) the total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are met for the second set of cells, partitioning the second set of cells to form a partitioned set of cells; scaling each of the partitioned cells of the partitioned set of cells to a standardized number of pixels to provide a respective snippet for each of the partitioned cells of the partitioned set of cells; classifying the respective snippets, using a neural network, to determine (i) a first set of partitioned cells classified as text and (ii) a second set of partitioned cells classified as non-text; determining an updated volume of text for the document based on an updated total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells and each cell of the first set of partitioned cells; and in response to a determination that the updated total amount of text in the document is not within the predetermined range, determining that the document is a text page. 3. The method of claim 2 , further comprising: in response to a determination that the updated total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are not met for the second set of partitioned cells, determining whether the first set of cells and the first set of partitioned cells have a satisfactory geometry; and in response to a determination that the first set of cells and the first set of partitioned cells have a satisfactory geometry, determining that the document is a text page. 4. The method of claim 3 , further comprising: in response to a determination that the first set of cells and the first set of partitioned cells do not have a satisfactory geometry, determining that the document is not a text page. 5. The method of claim 2 , wherein the respective snippets are classified in random order. 6. The method of claim 2 , wherein the respective snippets are classified in an order that prioritizes respective snippets adjacent to snippets previously classified as text. 7. The method of claim 2 , wherein partitioning the second set of cells to form the partitioned set of cells includes partitioning respective cells of the second set of cells into four cells. 8. The method of claim 1 , wherein one or more cells of the first set of cells are aligned to form at least one text line and wherein the at least one text line is one of: horizontal or vertical. 9. The method of claim 1 , wherein one or more cells of the second set of cells are classified as one of an image or unknown. 10. The method of claim 1 , wherein the document is captured using a smartphone. 11. The method of claim 1 , wherein the neural network is trained using a plurality of image documents and a plurality of text pages having various formats, layouts, text sizes, ranges of word, line and paragraph spacing. 12. A non-transitory computer readable medium storing one or more programs, the one or more programs comprising instructions, which when executed by a device with a camera, cause the device to: partition a document into a plurality of cells; scale each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells; classify the snippets, using a neural network, to determine (i) a first set of cells classified as text and (ii) a second set of cells classified as non-text; determine a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells; and in response to a determination that (i) the total amount of text in the document is not within a predetermined range and (ii) the first set of cells is aligned to one or more horizontal or vertical lines, determine that the document is a text page. 13. The non-transitory computer readable medium of claim 12 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that (i) the total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are met for the second set of cells, partition the second set of cells to form a partitioned set of cells; scale each of the partitioned cells of the partitioned set of cells to a standardized number of pixels to provide a respective snippet for each of the partitioned cells of the partitioned set of cells; classify the respective snippets, using a neural network, to determine (i) a first set of partitioned cells classified as text and (ii) a second set of partitioned cells classified as non-text; determine an updated volume of text for the document based on an updated total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells and each cell of the first set of partitioned cells; and in response to a determination that the updated total amount of text in the document is not within the predetermined range, determine that the document is a text page. 14. The non-transitory computer readable medium of claim 13 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that the updated total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are not met for the second set of partitioned cells, determine whether the first set of cells and the first set of partitioned cells have a satisfactory geometry; and in response to a determination that the first set of cells and the first set of partitioned cells have a satisfactory geometry, determine that the document is a text page. 15. The non-transitory computer readable medium of claim 14 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that the first set of cells and the first set of partitioned cells do not have a satisfactory geometry, determine that the document is not a text page. 16. The non-transitory computer readable medium of claim 13 , wherein the respective snippets are classified in random order. 17. The non-transitory computer readable medium of claim 13 , wherein the respective snippets are classified in an order that prioritizes respective snippets adjacent to snippets previously classified as text. 18. A device with a camera, the device comprising
Classification of content, e.g. text, photographs or tables · CPC title
Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title
Analysis of geometric attributes · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.