Robust method to find layout similarity between two documents
US-2015379341-A1 · Dec 31, 2015 · US
US9922247B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9922247-B2 |
| Application number | US-201514588670-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 2, 2015 |
| Priority date | Dec 18, 2013 |
| Publication date | Mar 20, 2018 |
| Grant date | Mar 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for enhancing and comparing documents. An example method comprises: comparing document images to identify a first document image of a reference document that corresponds with a second document image of a related document; transforming the second document image based on a layout of the first document image; and performing character recognition of the second document image.
Opening claim text (preview).
What is claimed is: 1. A method comprising: performing document analysis of a first document image of a related document; performing a first character recognition of the first document image; comparing the first document image with a second document image of a reference document based on results of the first character recognition; transforming the first document image based on the comparison of the first document image and the second document image; and performing a second character recognition of the transformed first document image using a reference dictionary generated in view of the reference document. 2. The method of claim 1 , further comprising calculating differences between the related document and the reference document based on results of the second character recognition. 3. The method of claim 1 , wherein the comparing the first document image with the second document image comprises comparing at least a part of a first layout and at least a part of a first text produced by the first character recognition of the first document image to at least a part of a second layout and at least a part of a second text of the second document image. 4. The method of claim 3 , wherein the part of the first text and the part of the second text comprise words having at least a predefined number of characters, and wherein the comparing further comprises calculating an edit distance between corresponding words. 5. The method of claim 1 , wherein the first document image comprises an image of a page of a document. 6. The method of claim 1 , wherein transforming comprises performing a linear transformation of the second document image based on positions of three points within each of the first document image and the second document image. 7. The method of claim 1 , wherein the reference dictionary comprises words from only a specific text block of the second document image. 8. A system comprising: a memory; a processor coupled to the memory, the processor configured to: perform document analysis of a first document image of a related document; perform a first character recognition of the first document image; compare the first document image with a second document image of a reference document based on results of the first character recognition; transform the first document image based on the comparison of the first document image and the second document image; and perform a second character recognition of the transformed first document image using a reference dictionary generated in view of the reference document. 9. The system of claim 8 , wherein the processor is further configured to calculate differences between the related document and the reference document based on results of the second character recognition. 10. The system of claim 8 , wherein comparing the first document image with the second document image comprises comparing at least a part of a first layout and at least a part of a first text produced by the first character recognition of the first document image to at least a part of a second layout and at least a part of a second text of the second document image. 11. The system of claim 10 , wherein the part of the first text and the part of the second text comprise words having at least a predefined number of characters, and wherein the comparing further comprises calculating an edit distance between corresponding words. 12. The system of claim 8 , wherein the first document image comprises an image of a page of a document. 13. The system of claim 8 , wherein transforming comprises performing a linear transformation of the second document image based on positions of three points within each of the first document image and the second document image. 14. The system of claim 8 , wherein the reference dictionary comprises words from only a specific text block of the second document image. 15. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computing device, cause the computing device to perform operations comprising: performing document analysis of a first document image of a related document; performing a first character recognition of the first document image; comparing the first document image with a second document image of a reference document based on results of the first character recognition; transforming the first document image based on the comparison of the first document image and the second document image; and performing a second character recognition of the transformed first document image using a reference dictionary generated in view of the reference document. 16. The computer-readable non-transitory storage medium of claim 15 , further comprising calculating differences between the related document and the reference document based on results of the second character recognition. 17. The computer-readable non-transitory storage medium of claim 15 , wherein comparing the first document image with the second document image comprises comparing at least a part of a first layout and a part of a first text produced by the first character recognition of the first document image to at least a part of a second layout and a part of a second text of the second document image. 18. The computer-readable non-transitory storage medium of claim 17 , wherein the part of the first text and the part of the second text comprise words having at least a predefined number of characters, and wherein the comparing further comprises calculating an edit distance between corresponding words. 19. The computer-readable non-transitory storage medium of claim 15 , wherein transforming comprises performing a linear transformation of the second document image based on positions of three points within each of the first document image and the second document image. 20. The computer-readable non-transitory storage medium of claim 15 , wherein the reference dictionary comprises words from only a specific text block of the second document image.
Physics · mapped topic
Physics · mapped topic
Document matching, e.g. of document images · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.