Robust method to find layout similarity between two documents
US-2015379341-A1 · Dec 31, 2015 · US
US2019005038A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019005038-A1 |
| Application number | US-201715639541-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 30, 2017 |
| Priority date | Jun 30, 2017 |
| Publication date | Jan 3, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and apparatus for creating a file directory of documents in a database that are clustered based on one or more high level features are disclosed. For example, the method includes identifying the one or more high level features for each one of a plurality of documents stored in the database, comparing the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents, grouping documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing and creating the file directory of documents in the database based on the plurality of clusters.
Opening claim text (preview).
What is claimed is: 1 . A method for creating a file directory of documents in a database that are clustered based on one or more high level features, comprising: identifying, by a processor, the one or more high level features for each one of a plurality of documents stored in the database; comparing, by the processor, the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping, by the processor, documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing; and creating, by the processor, the file directory of documents in the database based on the plurality of clusters. 2 . The method of claim 1 , wherein the one or more high level features comprises a spot title, an address field, a margin icon, a table, a border area, or a text flow. 3 . The method of claim 1 , wherein the one or more high level features are identified based on a predefined set of rules. 4 . The method of claim 3 , wherein the predefined set of rules comprises a size of a feature and a location of the feature relative to an origin. 5 . The method of claim 4 , wherein the origin comprises a top left corner of the document. 6 . The method of claim 3 , wherein the one or more high level features comprise a pre-defined priority level. 7 . The method of claim 6 , wherein a feature comprising two different rules of the predefined set of rules is identified based on the pre-defined priority level. 8 . The method of claim 1 , wherein the identifying and the comparing is performed based on only a first page the each one of the plurality of documents. 9 . The method of claim 1 , wherein the documents in each one of the plurality of clusters share a same number of different high level features. 10 . A non-transitory computer-readable medium storing a plurality of instructions, which when executed by a processor, cause the processor to perform operations for creating a file directory of documents in a database that are clustered based on one or more high level features, the operations comprising: identifying the one or more high level features for each one of a plurality of documents stored in the database; comparing the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing; and creating the file directory of documents in the database based on the plurality of clusters. 11 . The non-transitory computer-readable medium of claim 10 , wherein the one or more high level features comprises a spot title, an address field, a margin icon, a table, a border area, or a text flow. 12 . The non-transitory computer-readable medium of claim 10 , wherein the one or more high level features are identified based on a predefined set of rules. 13 . The non-transitory computer-readable medium of claim 12 , wherein the predefined set of rules comprises a size of a feature and a location of the feature relative to an origin. 14 . The non-transitory computer-readable medium of claim 13 , wherein the origin comprises a top left corner of the document. 15 . The non-transitory computer-readable medium of claim 12 , wherein the one or more high level features comprise a pre-defined priority level. 16 . The non-transitory computer-readable medium of claim 15 , wherein a feature comprising two different rules of the predefined set of rules is identified based on the pre-defined priority level. 17 . The non-transitory computer-readable medium of claim 10 , wherein the identifying and the comparing is performed based on only a first page the each one of the plurality of documents. 18 . The non-transitory computer-readable medium of claim 10 , wherein the documents in each one of the plurality of clusters share a same number of different high level features. 19 . A method for creating a file directory of documents in a database that are clustered based on one or more high level features, comprising: scanning, by a processor, a plurality of segments of each one of a plurality of documents stored in the database, wherein the plurality segments have a predefined size; comparing, by the processor, images in each one of the plurality of segments to a plurality of predefined rules, wherein each one of the plurality of predefined rules is associated with a different high level feature; identifying, by the processor, the one or more high level features based on the comparing for the each one of a plurality of documents; comparing, by the processor, the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping, by the processor, documents of the plurality of documents into a plurality of clusters, wherein the documents in each one of the plurality of clusters share a same number of different high level features that are identified based on the comparing; and creating, by the processor, the file directory of documents in the database based on the plurality of clusters. 20 . The method of claim 19 , wherein the one or more high level features comprises a spot title, an address filed, a margin icon, a table, a border area, or a text flow.
Related publications grouped by family.
Answers are generated from the same data shown on this page.