What technology area does this patent fall under?

Primary CPC classification G06F16/285. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Label-based document classification using artificial intelligence

US11809454B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11809454-B2
Application number	US-202017100864-A
Country	US
Kind code	B2
Filing date	Nov 21, 2020
Priority date	Nov 21, 2020
Publication date	Nov 7, 2023
Grant date	Nov 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Label-based document classification using artificial intelligence includes collecting, by one or more processors, a plurality of pre-trained classification models into a model pool and a plurality of documents into a document pool. The collected plurality of pre-trained classification models are applied in parallel to the plurality of documents in the document pool to generate a list of labels. Based on the list of labels, a final label result is generated according to which a baseline algorithm for document classification is generated by the one or more processors.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for document classification comprising: collecting, by one or more processors, a plurality of pre-trained classification models into a model pool; collecting, by the one or more processors, a plurality of documents into a document pool; applying in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; applying, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, determining, by the one or more processors, a final label result from the list of labels; and building, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 2. The method of claim 1 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 3. The method of claim 1 , wherein the final label result is generated as a hard tag. 4. The method of claim 1 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 5. The method of claim 1 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors. 6. The method of claim 5 , wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels. 7. A computer system for document classification, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: collecting, by one or more processors, a plurality of pre-trained classification models into a model pool; collecting, by the one or more processors, a plurality of documents into a document pool; applying in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; applying, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, determining, by the one or more processors, a final label result from the list of labels; and building, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 8. The computer system of claim 7 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 9. The computer system of claim 7 , wherein the final label result is generated as a hard tag. 10. The computer system of claim 7 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 11. The computer system of claim 7 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors. 12. The computer system of claim 11 , wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels. 13. A computer program product for document classification, comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to collect, by one or more processors, a plurality of pre-trained classification models into a model pool; program instructions to collect, by the one or more processors, a plurality of documents into a document pool; program instructions to apply in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; program instruction to apply, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, program instructions to determine, by the one or more processors, a final label result from the list of labels; and program instructions to build, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 14. The computer program product of claim 13 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 15. The computer program product of claim 13 , wherein the final label result is generated as a hard tag. 16. The computer program product of claim 13 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 17. The computer program product of claim 13 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors, wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels.

Assignees

Inventors

Classifications

G06F16/285Primary
Clustering or classification · CPC title
G06F40/117
Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title
G06F40/30
Semantic analysis · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 81657029

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11809454B2 cover?: Label-based document classification using artificial intelligence includes collecting, by one or more processors, a plurality of pre-trained classification models into a model pool and a plurality of documents into a document pool. The collected plurality of pre-trained classification models are applied in parallel to the plurality of documents in the document pool to generate a list of labels.…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).