Automatically classifying page images

US9594833B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9594833-B2
Application numberUS-201213621054-A
CountryUS
Kind codeB2
Filing dateSep 15, 2012
Priority dateAug 30, 2006
Publication dateMar 14, 2017
Grant dateMar 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for automatically classifying images of pages of a source, such as a book, into classifications such as front cover, copyright page, table of contents, text, index, etc. In one embodiment, three phases are provided in the classification process. During a first phase of the classification process, a first classifier may be used to determine a preliminary classification of a page image based on single-page criteria. During a second phase of the classification process, a second classifier may be used to determine a final classification for the page image based on multiple-page and/or global criteria. During an optional third phase of classification, a verifier may be used to verify the final classification of the page image based on verification criteria. If automatic classification fails, the page image may be passed on to a human operator for manual classification.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for classifying a type of page of a set of serially organized pages, the system comprising: one or more processors configured with computer-executable instructions that, when executed, cause the one or more processors to: determine a preliminary classification for a page of the set of serially organized pages based at least in part on content in the page, wherein the preliminary classification is determined independent of content in other pages of the set of serially organized pages; identify a location of the page within the set of serially organized pages, wherein the location of the page indicates an ordering of the page relative to other pages in the set of serially organized pages; determine an updated classification for the page, wherein the updated classification is determined at least from inputs comprising the preliminary classification for the page, as based at least in part on the content in the page, and the location of the page relative to other pages in the set of serially organized pages; and utilizing the updated classification to fulfill a user request for a page corresponding to the updated classification. 2. The system of claim 1 , wherein the updated classification corresponds to at least one of a front cover, front face, front matter, copyright page, table of contents, text, index, back matter, or back cover. 3. The system of claim 1 , wherein the inputs further comprise content in multiple pages of the set of serially organized pages. 4. The system of claim 1 , wherein the one or more processors are further configured to verify the updated classification based at least in part on a set of verification criteria. 5. The system of claim 1 , wherein at least one of the preliminary classification or the updated classification is based at least in part on dynamic information that is determined during classification of the page. 6. The system of claim 1 , wherein at least one of the preliminary classification or the updated classification is based at least in part on a linear combination of classification criteria. 7. A computer-implemented method for classifying a type of page of a set of serially organized pages, the computer-implemented method comprising: determining a classification for a page of the set of serially organized pages based at least in part on content in the page independent of content in other pages of the set of serially organized pages; identifying a location of the page within the set of serially organized pages, wherein the location of the page indicates an ordering of the page relative to other pages in the set of serially organized pages; determining a modified classification for the page, wherein the modified classification is determined at least from inputs comprising the classification for the page, as based at least in part on the content in the page, and the location of the page relative to other pages in the set of serially organized pages; and storing the modified classification in at least one data store. 8. The computer-implemented method of claim 7 , wherein determining a classification comprises: applying at least one potential classification to the page; determining a classification score for the at least one potential classification based at least in part on classification criteria; and determining whether the classification score for the at least one potential classification exceeds a threshold value. 9. The computer-implemented method of claim 7 further comprising verifying the modified classification based at least in part on a set of verification criteria. 10. The computer-implemented method of claim 9 further comprising: determining that the modified classification does not satisfy the set of verification criteria, and transmitting the page for manual classification. 11. The computer-implemented method of claim 7 , wherein the inputs further comprise global page data of the set of serially organized pages. 12. The computer-implemented method of claim 11 , wherein the global page data includes aggregate page information collected from all pages of the set of serially organized pages. 13. The computer-implemented method of claim 7 , wherein at least one of the classification or the modified classification is based at least in part on a linear combination of classification criteria. 14. The computer-implemented method of claim 7 , wherein the location of the page within the set of serially organized pages corresponds to one of a location of the page within a first portion of the set of serially organized pages or a location of the page within a second portion of the set of serially organized pages. 15. The computer-implemented method of claim 14 , wherein: the set of serially organized pages is composed of a first half and a second half; the first portion of the set of serially organized pages is the first half of the set of serially organized pages; and the second portion of the set of serially organized pages is the second half of the set of serially organized pages. 16. The computer-implemented method of claim 7 , wherein determining the modified classification for the page comprises: identifying one or more classifications of the page as excluded classifications based on the location of the page within the set of serially organized pages; and selecting the modified classification for the page as a classification other than the excluded classifications. 17. A non-transitory computer-readable medium having computer-executable instructions executable by a computing system to cause the computing system to: determine a first classification for a page of a set of serially organized pages based at least in part on content in the page, wherein the first classification is independent of content in other pages of the set of serially organized pages; identify a location of the page within the set of serially organized pages, wherein the location of the page indicates an ordering of the page relative to other pages in the set of serially organized pages; determine a second classification for the page, wherein the second classification is determined at least from inputs comprising the first classification for the page, as based at least in part on the content in the page, and the location of the page relative to other pages in the set of serially organized pages; and store the second classification in at least one data store. 18. The non-transitory computer-readable medium claim 17 further comprising a verification module configured to verify the second classification based at least in part one a set of verification criteria. 19. The non-transitory computer-readable medium claim 18 , wherein the verification module is further configured to determine that the second classification does not satisfy the set of verification criteria, and to transmit the page for manual classification. 20. The non-transitory computer-readable medium claim 17 , wherein the inputs further comprise aggregate page information collected from all pages of the set of serially organized pages. 21. The non-transitory computer-readable medium claim 17 , wherein at least one of the first classification or the second classification is based at least in part on static information determined prior to the classification of the page image. 22. The non-transitory computer-readable medium claim 17 , wherein at least one of the first classification or the second classification is based at least in part on a Bayes

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9594833B2 cover?
Systems and methods are disclosed for automatically classifying images of pages of a source, such as a book, into classifications such as front cover, copyright page, table of contents, text, index, etc. In one embodiment, three phases are provided in the classification process. During a first phase of the classification process, a first classifier may be used to determine a preliminary classif…
Who is the assignee on this patent?
Behm Bradley Jeffery, Wood Brent Eric, Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).