Automatic document classification using machine learning

US11238313B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11238313-B2
Application numberUS-201916559322-A
CountryUS
Kind codeB2
Filing dateSep 3, 2019
Priority dateSep 3, 2019
Publication dateFeb 1, 2022
Grant dateFeb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automatic document classification using machine learning may involve receiving inputs that assign documents to classifiers, which define document classification rules for a classification model. The computing device may train the classification model using a machine learning technique that assigns each document of a second set of documents to destinations based on the document classification rules. The computing device may also receive a template design for each destination that specifies metadata to extract for a document type corresponding to documents assigned to the destination. The computing device may subsequently classifying a particular document using the classification model, which may involve assigning the particular document to a given destination of the plurality of destinations based on the document classification rules, and exporting metadata from the particular document using the template design associated with the given destination.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, at a computing device and from a user interface, a set of inputs, wherein each input assigns a document of a first set of documents to a classifier of a set of classifiers such that the set of classifiers define document classification rules for a classification model; training, by the computing device, the classification model using a machine learning technique that assigns each document of a second set of documents to a particular destination of a plurality of destinations based on the document classification rules; receiving, at the computing device, a template design for each destination of the plurality of destinations, wherein each template design specifies metadata to extract for a document type corresponding to documents assigned to a destination; subsequently classifying a particular document using the classification model, wherein classifying the particular document involves: (i) assigning the particular document to a given destination of the plurality of destinations based on the document classification rules, (ii) exporting at least some metadata from the particular document using the template design associated with the given destination, and (iii) obtaining additional information using content of extracted metadata to supplement the exported metadata; determining an accuracy rate based on at least manual validation received from a user or automatic validation based on predefined rules; and providing a notification to the user in response to a determination that the accuracy rate is below a threshold accuracy rate. 2. The method of claim 1 , further comprising: receiving the first set of documents via one or more electronic messages and one or more uploads. 3. The method of claim 2 , wherein the computing device is a printing device, and wherein receiving the first set of documents further comprises receiving at least one document of the first set of documents using a scanning process. 4. The method of claim 1 , wherein training the classification model using the machine learning technique comprises: determining a structure of a given document of the second set of documents by analyzing an arrangement of metadata in the given document without using text recognition; identifying a destination defined by a particular classifier such that the destination includes documents that have respective structures similar to the structure of the given document; and assigning the given document to the identified destination. 5. The method of claim 1 , wherein training the classification model using the machine learning technique comprises: determining metadata of a given document of the second set of documents; identifying a destination defined by a particular classifier such that the destination includes documents that have respective metadata similar to the metadata of the given document; and assigning the given document to the identified destination. 6. The method of claim 1 , wherein subsequently classifying a particular document using the classification model comprises: assigning the particular document to an unclassified folder; and providing a notification that indicates the particular document was assigned to the unclassified folder. 7. The method of claim 6 , wherein providing the notification that indicates the particular document was routed to the unclassified folder further comprises: providing a link with the notification that enables manual classification of the particular document. 8. The method of claim 7 , further comprising: receiving manual classification of the particular document; and subsequently training the classification model using the manual classification of the particular document such that the classification model classifies documents similar to the particular document based on the manual classification. 9. The method of claim 1 , further comprising: responsive to subsequently classifying the particular document using the classification model, obtaining information associated with a source of the particular document from a database based on metadata extracted from the particular document using the template design associated with the given destination. 10. A system comprising: a user interface; a computing device configured to: receive, from the user interface, a set of inputs, wherein each input assigns a document of a first set of documents to a classifier of a set of classifiers such that the set of classifiers define document classification rules for a classification model; train the classification model using a machine learning technique that assigns each document of a second set of documents to a particular destination of a plurality of destinations based on the document classification rules; receive a template design for each destination of the plurality of destinations, wherein each template design specifies metadata to extract for a document type corresponding to documents assigned to a destination; subsequently classify a particular document using the classification model, wherein classifying the particular document involves: (i) assigning the particular document to a given destination of the plurality of destinations based on the document classification rules, (ii) exporting at least some metadata from the particular document using the template design associated with the given destination, and (iii) obtaining additional information using content of extracted metadata to supplement the exported metadata; determine an accuracy rate based on at least manual validation received from a user or automatic validation based on predefined rules; and provide a notification to the user in response to a determination that the accuracy rate is below a threshold accuracy rate. 11. The system of claim 10 , wherein the computing device is further configured to: responsive to training the classification model using the machine learning technique, providing the classification model to a second computing device, wherein the second computing device is configured to execute the classification model. 12. The system of claim 10 , wherein each destination of the plurality of destinations corresponds to a folder for storing documents. 13. The system of claim 10 , wherein the computing device is a printing device comprising a scanner for scanning documents. 14. The system of claim 10 , wherein the computing device is configured to train the classification model using the machine learning technique to: determine a structure of a given document of the second set of documents by analyzing an arrangement of metadata in the given document without using text recognition; identify a destination defined by a particular classifier such that the destination includes documents that have respective structures similar to the structure of the given document; and assign the given document to the identified destination. 15. The system of claim 10 , wherein the computing device is configured to train the classification model using the machine learning technique to: determine metadata of a given document of the second set of documents; identify a destination defined by a particular classifier such that the destination includes documents that have respective metadata similar to the metadata of the given document; and assign the given document to the identified destination. 16. The system of claim 10 , wherein the computing device is configured to subsequently classify the particular document using the classification model via assigning the particular document to a destination associated with unclas

Assignees

Inventors

Classifications

  • with the intervention of an operator · CPC title

  • Rule-based classification · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor · CPC title

  • Interactive pattern learning with a human teacher · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11238313B2 cover?
Automatic document classification using machine learning may involve receiving inputs that assign documents to classifiers, which define document classification rules for a classification model. The computing device may train the classification model using a machine learning technique that assigns each document of a second set of documents to destinations based on the document classification ru…
Who is the assignee on this patent?
Kyocera Document Solutions Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).