Fast identification of text intensive pages from photographs

US11715316B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11715316-B2
Application numberUS-202117542856-A
CountryUS
Kind codeB2
Filing dateDec 6, 2021
Priority dateSep 23, 2015
Publication dateAug 1, 2023
Grant dateAug 1, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by an electronic device having one or more processors for determining if a document is a text page, the method comprising: partitioning the document into a plurality of cells; scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells; classifying the snippets, using a neural network, to determine (i) a first set of cells classified as text and (ii) a second set of cells classified as non-text; determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells; and in response to a determination that (i) the total amount of text in the document is within a predetermined range and (ii) the first set of cells are aligned to one or more horizontal or vertical lines, determining that the document is a text page. 2. The method of claim 1 , further comprising: in response to a determination that (i) the total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are met for the second set of cells, partitioning the second set of cells to form a partitioned set of cells; scaling each of the partitioned cells of the partitioned set of cells to a standardized number of pixels to provide a respective snippet for each of the partitioned cells of the partitioned set of cells; classifying the respective snippets, using a neural network, to determine (i) a first set of partitioned cells classified as text and (ii) a second set of partitioned cells classified as non-text; determining an updated volume of text for the document based on an updated total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells and each cell of the first set of partitioned cells; and in response to a determination that the updated total amount of text in the document is not within the predetermined range, determining that the document is a text page. 3. The method of claim 2 , further comprising: in response to a determination that the updated total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are not met for the second set of partitioned cells, determining whether the first set of cells and the first set of partitioned cells have a satisfactory geometry; and in response to a determination that the first set of cells and the first set of partitioned cells have a satisfactory geometry, determining that the document is a text page. 4. The method of claim 3 , further comprising: in response to a determination that the first set of cells and the first set of partitioned cells do not have a satisfactory geometry, determining that the document is not a text page. 5. The method of claim 2 , wherein the respective snippets are classified in random order. 6. The method of claim 2 , wherein the respective snippets are classified in an order that prioritizes respective snippets adjacent to snippets previously classified as text. 7. The method of claim 2 , wherein partitioning the second set of cells to form the partitioned set of cells includes partitioning respective cells of the second set of cells into four cells. 8. The method of claim 1 , wherein one or more cells of the first set of cells are aligned to form at least one text line and wherein the at least one text line is one of: horizontal or vertical. 9. The method of claim 1 , wherein one or more cells of the second set of cells are classified as one of an image or unknown. 10. The method of claim 1 , wherein the document is captured using a smartphone. 11. The method of claim 1 , wherein the neural network is trained using a plurality of image documents and a plurality of text pages having various formats, layouts, text sizes, ranges of word, line and paragraph spacing. 12. A non-transitory computer readable medium storing one or more programs, the one or more programs comprising instructions, which when executed by a device with a camera, cause the device to: partition a document into a plurality of cells; scale each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells; classify the snippets, using a neural network, to determine (i) a first set of cells classified as text and (ii) a second set of cells classified as non-text; determine a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells; and in response to a determination that (i) the total amount of text in the document is not within a predetermined range and (ii) the first set of cells is aligned to one or more horizontal or vertical lines, determine that the document is a text page. 13. The non-transitory computer readable medium of claim 12 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that (i) the total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are met for the second set of cells, partition the second set of cells to form a partitioned set of cells; scale each of the partitioned cells of the partitioned set of cells to a standardized number of pixels to provide a respective snippet for each of the partitioned cells of the partitioned set of cells; classify the respective snippets, using a neural network, to determine (i) a first set of partitioned cells classified as text and (ii) a second set of partitioned cells classified as non-text; determine an updated volume of text for the document based on an updated total amount of text in the document corresponding to a sum of an amount of text in each cell of the first set of cells and each cell of the first set of partitioned cells; and in response to a determination that the updated total amount of text in the document is not within the predetermined range, determine that the document is a text page. 14. The non-transitory computer readable medium of claim 13 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that the updated total amount of text in the document is not within the predetermined range and (ii) that partitioning criteria are not met for the second set of partitioned cells, determine whether the first set of cells and the first set of partitioned cells have a satisfactory geometry; and in response to a determination that the first set of cells and the first set of partitioned cells have a satisfactory geometry, determine that the document is a text page. 15. The non-transitory computer readable medium of claim 14 , wherein the one or more programs further comprising instructions, which when executed by the device, cause the device to: in response to a determination that the first set of cells and the first set of partitioned cells do not have a satisfactory geometry, determine that the document is not a text page. 16. The non-transitory computer readable medium of claim 13 , wherein the respective snippets are classified in random order. 17. The non-transitory computer readable medium of claim 13 , wherein the respective snippets are classified in an order that prioritizes respective snippets adjacent to snippets previously classified as text. 18. A device with a camera, the device comprising

Assignees

Inventors

Classifications

  • G06V30/413Primary

    Classification of content, e.g. text, photographs or tables · CPC title

  • Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title

  • Analysis of geometric attributes · CPC title

  • Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11715316B2 cover?
Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the wi…
Who is the assignee on this patent?
Evernote Corp
What technology area does this patent fall under?
Primary CPC classification G06V30/413. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).