What technology area does this patent fall under?

Primary CPC classification G06F16/51. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for grouping documents based on high-level features clustering

US2019005038A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2019005038-A1
Application number	US-201715639541-A
Country	US
Kind code	A1
Filing date	Jun 30, 2017
Priority date	Jun 30, 2017
Publication date	Jan 3, 2019
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for creating a file directory of documents in a database that are clustered based on one or more high level features are disclosed. For example, the method includes identifying the one or more high level features for each one of a plurality of documents stored in the database, comparing the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents, grouping documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing and creating the file directory of documents in the database based on the plurality of clusters.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for creating a file directory of documents in a database that are clustered based on one or more high level features, comprising: identifying, by a processor, the one or more high level features for each one of a plurality of documents stored in the database; comparing, by the processor, the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping, by the processor, documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing; and creating, by the processor, the file directory of documents in the database based on the plurality of clusters. 2 . The method of claim 1 , wherein the one or more high level features comprises a spot title, an address field, a margin icon, a table, a border area, or a text flow. 3 . The method of claim 1 , wherein the one or more high level features are identified based on a predefined set of rules. 4 . The method of claim 3 , wherein the predefined set of rules comprises a size of a feature and a location of the feature relative to an origin. 5 . The method of claim 4 , wherein the origin comprises a top left corner of the document. 6 . The method of claim 3 , wherein the one or more high level features comprise a pre-defined priority level. 7 . The method of claim 6 , wherein a feature comprising two different rules of the predefined set of rules is identified based on the pre-defined priority level. 8 . The method of claim 1 , wherein the identifying and the comparing is performed based on only a first page the each one of the plurality of documents. 9 . The method of claim 1 , wherein the documents in each one of the plurality of clusters share a same number of different high level features. 10 . A non-transitory computer-readable medium storing a plurality of instructions, which when executed by a processor, cause the processor to perform operations for creating a file directory of documents in a database that are clustered based on one or more high level features, the operations comprising: identifying the one or more high level features for each one of a plurality of documents stored in the database; comparing the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping documents of the plurality of documents into a plurality of clusters based on common high level features that are identified in the comparing; and creating the file directory of documents in the database based on the plurality of clusters. 11 . The non-transitory computer-readable medium of claim 10 , wherein the one or more high level features comprises a spot title, an address field, a margin icon, a table, a border area, or a text flow. 12 . The non-transitory computer-readable medium of claim 10 , wherein the one or more high level features are identified based on a predefined set of rules. 13 . The non-transitory computer-readable medium of claim 12 , wherein the predefined set of rules comprises a size of a feature and a location of the feature relative to an origin. 14 . The non-transitory computer-readable medium of claim 13 , wherein the origin comprises a top left corner of the document. 15 . The non-transitory computer-readable medium of claim 12 , wherein the one or more high level features comprise a pre-defined priority level. 16 . The non-transitory computer-readable medium of claim 15 , wherein a feature comprising two different rules of the predefined set of rules is identified based on the pre-defined priority level. 17 . The non-transitory computer-readable medium of claim 10 , wherein the identifying and the comparing is performed based on only a first page the each one of the plurality of documents. 18 . The non-transitory computer-readable medium of claim 10 , wherein the documents in each one of the plurality of clusters share a same number of different high level features. 19 . A method for creating a file directory of documents in a database that are clustered based on one or more high level features, comprising: scanning, by a processor, a plurality of segments of each one of a plurality of documents stored in the database, wherein the plurality segments have a predefined size; comparing, by the processor, images in each one of the plurality of segments to a plurality of predefined rules, wherein each one of the plurality of predefined rules is associated with a different high level feature; identifying, by the processor, the one or more high level features based on the comparing for the each one of a plurality of documents; comparing, by the processor, the one or more high level features of the each one of the plurality of documents to other documents of the plurality of documents; grouping, by the processor, documents of the plurality of documents into a plurality of clusters, wherein the documents in each one of the plurality of clusters share a same number of different high level features that are identified based on the comparing; and creating, by the processor, the file directory of documents in the database based on the plurality of clusters. 20 . The method of claim 19 , wherein the one or more high level features comprises a spot title, an address filed, a margin icon, a table, a border area, or a text flow.

Assignees

Xerox Corp

Inventors

Classifications

G06F16/5854
using shape and object relationship · CPC title
G06F16/51Primary
Indexing; Data structures therefor; Storage structures · CPC title
G06F16/93Primary
Document management systems · CPC title
G06F17/30253
Physics · mapped topic
G06F17/30011
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 64738884

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019005038A1 cover?: A method and apparatus for creating a file directory of documents in a database that are clustered based on one or more high level features are disclosed. For example, the method includes identifying the one or more high level features for each one of a plurality of documents stored in the database, comparing the one or more high level features of the each one of the plurality of documents to o…
Who is the assignee on this patent?: Xerox Corp
What technology area does this patent fall under?: Primary CPC classification G06F16/51. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).