Massively parallel real-time database-integrated machine learning inference engine
US-11429893-B1 · Aug 30, 2022 · US
US11809454B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11809454-B2 |
| Application number | US-202017100864-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 21, 2020 |
| Priority date | Nov 21, 2020 |
| Publication date | Nov 7, 2023 |
| Grant date | Nov 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Label-based document classification using artificial intelligence includes collecting, by one or more processors, a plurality of pre-trained classification models into a model pool and a plurality of documents into a document pool. The collected plurality of pre-trained classification models are applied in parallel to the plurality of documents in the document pool to generate a list of labels. Based on the list of labels, a final label result is generated according to which a baseline algorithm for document classification is generated by the one or more processors.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for document classification comprising: collecting, by one or more processors, a plurality of pre-trained classification models into a model pool; collecting, by the one or more processors, a plurality of documents into a document pool; applying in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; applying, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, determining, by the one or more processors, a final label result from the list of labels; and building, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 2. The method of claim 1 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 3. The method of claim 1 , wherein the final label result is generated as a hard tag. 4. The method of claim 1 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 5. The method of claim 1 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors. 6. The method of claim 5 , wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels. 7. A computer system for document classification, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: collecting, by one or more processors, a plurality of pre-trained classification models into a model pool; collecting, by the one or more processors, a plurality of documents into a document pool; applying in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; applying, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, determining, by the one or more processors, a final label result from the list of labels; and building, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 8. The computer system of claim 7 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 9. The computer system of claim 7 , wherein the final label result is generated as a hard tag. 10. The computer system of claim 7 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 11. The computer system of claim 7 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors. 12. The computer system of claim 11 , wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels. 13. A computer program product for document classification, comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to collect, by one or more processors, a plurality of pre-trained classification models into a model pool; program instructions to collect, by the one or more processors, a plurality of documents into a document pool; program instructions to apply in parallel, by the one or more processors, the collected plurality of pre-trained classification models to the plurality of documents in the document pool to simultaneously generate a list of labels for document classification rather than generating one label at a time; program instruction to apply, by the one or more processors, a weighted voting method to the list of labels for optimizing weights of base classifiers; based on a weighted voting combination rule, program instructions to determine, by the one or more processors, a final label result from the list of labels; and program instructions to build, by the one or more processors, a baseline algorithm for document classification based on the determined final label result. 14. The computer program product of claim 13 , wherein predicting the list of labels further comprises: using, by the one or more processors, a word length N as a sliding window to obtain text data from the plurality of documents. 15. The computer program product of claim 13 , wherein the final label result is generated as a hard tag. 16. The computer program product of claim 13 , wherein determining the final label result further comprises: retaining, by the one or more processors, a prediction value of the sliding window to generate a soft tag. 17. The computer program product of claim 13 , wherein building the baseline algorithm further comprises: converting, by the one or more processors, the list of labels into a feature matrix comprising one or more one-hot vectors, wherein a length of a one-vector in the one or more one-hot vectors is aligned with a longest label dimension of all labels in the list of labels.
Clustering or classification · CPC title
Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title
Semantic analysis · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.