What technology area does this patent fall under?

Primary CPC classification G06F16/353. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for classifying textual data blocks

US12346364B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12346364-B2
Application number	US-202218087069-A
Country	US
Kind code	B2
Filing date	Dec 22, 2022
Priority date	Dec 22, 2022
Publication date	Jul 1, 2025
Grant date	Jul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and a system of classifying textual data blocks are claimed. The method includes receiving at least one textual data block in an original version, including a plurality of textual data elements; performing a preprocessing procedure on the at least one textual data block in the original version, wherein the preprocessing procedure includes replacing the textual data elements characterized by pertinence to at least one specific part-of-speech (POS) category with a respective POS token, thereby obtaining the at least one textual data block in a preprocessed version; inferring a pretrained ML-based model on the at least one textual data block in the preprocessed version, to classify the at least one textual data block by pertinence to the at least one class.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of classifying textual data blocks by at least one processor, the method comprising: receiving textual data blocks in an original version, each of the textual data blocks comprising a plurality of textual data elements; performing a preprocessing procedure on each of the textual data blocks in the original version, wherein the preprocessing procedure comprises: replacing, for each of the textual data blocks, each of the textual data elements characterized by presence of a specific character or sequence of characters with a respective character-based token to generate a first partial tokenized version of the respective textual data block; replacing, for each first partial tokenized version of the textual data blocks, each of the first partial tokenized versions further characterized by a specific contextual definition with a respective context-based token to generate a second partial tokenized version of the respective first partial tokenized first; and replacing, for each second partial tokenized version of the textual data blocks, each of the second partial tokenized versions further characterized by pertinence to at least one specific part-of-speech (POS) category with a respective POS token, thereby obtaining, for each of the textual data blocks, the textual data block in a preprocessed tokenized version of the respective textual data block; forming a training dataset comprising the textual data blocks in the preprocessed tokenized version labeled with an indication of pertinence to at least once class indicative of an email signature block; training a machine learning-based (ML-based) model to classify textual data blocks by pertinence to the at least one class, based on the training dataset, wherein the ML-based model comprises an artificial neural network; receiving a new textual data block in an original version, the new textual data block comprising a plurality of textual data elements and performing the preprocessing procedure to obtain, for the new textual data block, the new textual data block in the preprocessed tokenized version; performing machine learning, using the trained ML-based model, on the new textual data block in the preprocessed tokenized version to classify the new textual data block by pertinence to the at least one class. 2. The method of claim 1 , wherein the at least one specific POS category is “proper noun” category; and the preprocessing procedure further comprises defining the textual data elements as pertaining to a “proper noun” category. 3. The method of claim 1 , wherein the specific character or sequence of characters are a character or sequence of characters specific for at least one of: an email address, an alphanumeric or numeric code, and a Uniform Resource Locator (URL). 4. The method of claim 1 , wherein the specific contextual definition represents a definition of the textual data elements as pertaining to a “named entity” category. 5. The method of claim 1 , wherein each one of the textual data blocks comprises textual data elements arranged in at least one line; and the method further comprises preliminarily classifying each one of the textual data blocks by pertinence to the at least one class based on a length of the at least one line of the textual data elements. 6. The method of claim 1 , wherein the preprocessing procedure further comprises embedding each preprocessed tokenized version of the textual data blocks into a vector space; wherein each one of the textual data blocks represents a paragraph of an email and the textual data elements represent words. 7. The method of claim 6 , wherein embedding each preprocessed tokenized version of the textual data blocks into a vector space comprises creating a vector representation of each textual data element based on a term frequency-inverse document frequency (TF-IDF) measure. 8. The method of claim 1 , wherein each one of the textual data blocks represents a paragraph of an email and the textual data elements represent words. 9. The method of claim 1 , wherein each one of the textual data blocks represents an email section of an email. 10. A system for classifying textual data blocks, the system comprising: a non-transitory memory device, wherein modules of instruction code are stored, and at least one processor associated with the memory device, and configured to execute the modules of instruction code, whereupon execution of said modules of instruction code, the at least one processor is configured to: receive textual data blocks in an original version, each of the textual data blocks comprising a plurality of textual data elements; perform a preprocessing procedure on each of the textual data blocks in the original version, wherein the preprocessing procedure comprises: replacing, for each of the textual data blocks, each of the textual data elements characterized by presence of a specific character or sequence of characters with a respective character-based token to generate a first partial tokenized version of the respective textual data block; replacing, for each first partial tokenized version of the textual data blocks, each of the first partial tokenized versions further characterized by a specific contextual definition with a respective context-based token to generate a second partial tokenized version of the respective first partial tokenized first; and replacing, for each second partial tokenized version of the textual data blocks, each of the second partial tokenized versions further characterized by pertinence to at least one specific part-of-speech (POS) category with a respective POS token, thereby obtaining, for each of the textual data blocks, the textual data block in a preprocessed tokenized version of the respective textual data block; form a training dataset comprising the textual data blocks in the preprocessed tokenized version labeled with an indication of pertinence to at least once class indicative of an email signature block; train a machine learning-based (ML-based) model to classify textual data blocks by pertinence to the at least one class, based on the training dataset, wherein the ML-based model comprises an artificial neural network; receive a new textual data block in an original version, the new textual data block comprising a plurality of textual data elements and performing the preprocessing procedure to obtain, for the new textual data block, the new textual data block in the preprocessed version; perform machine learning, using the trained ML-based model, on the new textual data block in the preprocessed tokenized version to classify the new textual data block by pertinence to the at least one class. 11. The system of claim 10 , wherein the at least one specific POS category is a “proper noun” category; and the preprocessing procedure further comprises defining the textual data elements as pertaining to a “proper noun” category. 12. The system of claim 10 , wherein the specific character or sequence of characters are a character or sequence of characters specific for at least one of: an email address, an alphanumeric or numeric code, and a Uniform Resource Locator (URL). 13. The system of claim 10 , wherein the specific contextual definition represents a definition of the textual data elements as pertaining to a “named entity” category. 14. The system of claim 10 , wherein each one of the textual data blocks comprises textual data elements arranged in at least one line; and the at least one processor is further configured to preliminarily classify each one of the textual data blocks by pertinence to the at least one class based on a length of the at

Assignees

Genesys Cloud Services Inc

Inventors

Classifications

G06N3/08
Learning methods · CPC title
G06F40/279
Recognition of textual entities · CPC title
G06F40/205
Parsing · CPC title
G06F40/268
Morphological analysis · CPC title
G06F16/353Primary
into predefined classes · CPC title

Patent family

Related publications grouped by family.

View patent family 89661317

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346364B2 cover?: A method and a system of classifying textual data blocks are claimed. The method includes receiving at least one textual data block in an original version, including a plurality of textual data elements; performing a preprocessing procedure on the at least one textual data block in the original version, wherein the preprocessing procedure includes replacing the textual data elements characteriz…
Who is the assignee on this patent?: Genesys Cloud Services Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).