Who is the assignee on this patent?

American Express Travel Related Services Co Inc, American Express India Private Ltd

What technology area does this patent fall under?

Primary CPC classification G06F40/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Natural language processing for categorizing sequences of text data

US12340182B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12340182-B2
Application number	US-202217707110-A
Country	US
Kind code	B2
Filing date	Mar 29, 2022
Priority date	Apr 1, 2021
Publication date	Jun 24, 2025
Grant date	Jun 24, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for categorizing sequences of text extracted from documents using natural language processing. In some embodiments, a categorization system may receive a first document file in a machine readable format. The categorization system may analyze a sequence of text from the first document file and identify a numeric text string in the sequence. The categorization system may also identify text data in the sequence matching text data from a second document file. The categorization system may remove the numeric text string and the matching data from the sequence to generate a trimmed version of the sequence. The categorization system may then apply a vectorization model to the trimmed version of the sequence as well as a trained deep learning model to the vector version to identify a corresponding category for the sequence of text.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for categorizing text data, comprising: receiving a first document file in a machine-readable format, wherein the first document file includes one or more sequences of text; analyzing a sequence of text from the one or more sequences to identify a numeric text string in the sequence of text that forms an alphanumeric reference; generating a trimmed version of the sequence of text by: removing first text data from the sequence of text based on an amount of matches between the first text data and respective text data for a plurality of document files satisfying a match threshold, and removing the numeric text string from the sequence of text to transform the alphanumeric reference to second text data; generating, based on the trimmed version of the sequence of text put into a vectorization model, a vector version of the trimmed version of the sequence of text; and generating, based on the vector version put into a deep learning model, a categorization of the sequence of text, wherein the deep learning model is pre-trained to categorize vector representations of the text data into predefined categories based on language pattern dependencies indicated by states of cells of the vector version that correspond to each portion of another sequence of text that is managed by a respective neural network of a plurality of neural networks of the deep learning model, wherein the language pattern dependencies correspond to a first language that is different from a second language indicated by the trimmed version of the sequence of text. 2. The computer-implemented method of claim 1 , wherein the first document file is a commercial bank statement. 3. The computer-implemented method of claim 2 , wherein the sequence of text is a row of transaction description text from the commercial bank statement. 4. The computer-implemented method of claim 1 , wherein analyzing the sequence of text further comprises: applying a crowd learning algorithm to compare the first text data to text data from the plurality of document files including a second document file. 5. The computer-implemented method of claim 1 , wherein the generating the vector version of the trimmed version of the sequence is further based on: a word2vec algorithm. 6. The computer-implemented method of claim 1 , wherein the deep learning model is a long short-term memory (LSTM) model. 7. The computer-implemented method of claim 4 , wherein the first document file includes a plurality of sequences of text, the method further comprising: analyzing the plurality of sequences to identify third text data matching fourth text data from the second document file; in response to analyzing the plurality of sequences to identify the third text data matching the fourth text data, removing the third text data from the plurality of sequences to generate a trimmed version of the plurality of sequences; applying the vectorization model to the trimmed version of the plurality of sequences to generate a vector version of the plurality of sequences; and applying the deep learning model to the vector version of the plurality of sequences to categorize each sequence from the plurality of sequences. 8. A system for categorizing text data, comprising: a memory; and at least one processor coupled to the memory and configured to: receive a first document file in a machine-readable format, wherein the first document file includes one or more sequences of text; analyze a sequence of text from the one or more sequences to identify a numeric text string in the sequence of text that forms an alphanumeric reference; generate a trimmed version of the sequence of text by: removing first text data from the sequence of text based on an amount of matches between the first text data and respective text data for a plurality of document files satisfying a match threshold, and removing one or more numeric text strings from the sequence of text; generate, based on the trimmed version of the sequence of text put into a vectorization model, a vector version of the trimmed version of the sequence of text; and generating, based on the vector version put into a deep learning model, a categorization of the sequence of text, wherein the deep learning model is pre-trained to categorize vector representations of the text data into predefined categories based on language pattern dependencies indicated by states of cells of the vector version that correspond to each portion of another sequence of text that is managed by a respective neural network of a plurality of neural networks of the deep learning model, wherein the language pattern dependencies correspond to a first language that is different from a second language indicated by the trimmed version of the sequence of text. 9. The system of claim 8 , wherein the first document file is a commercial bank statement. 10. The system of claim 9 , wherein the sequence of text is a row of transaction description text from the commercial bank statement. 11. The system of claim 8 , wherein to analyze the sequence of text, the at least one processor is further configured to: apply a crowd learning algorithm to compare the first text data to text data from the plurality of document files including a second document file. 12. The system of claim 8 , wherein to generate the vector version of the trimmed version of the sequence the at least one processor is further configured to: execute a word2vec algorithm. 13. The system of claim 8 , wherein the deep learning model is a long short-term memory (LSTM) model. 14. The system of claim 11 , wherein the first document file includes a plurality of sequences of text and wherein the at least one processor is further configured to: analyze the plurality of sequences to identify third text data matching fourth text data from the second document file; in response to analyzing the plurality of sequences to identify the third text data matching the fourth text data, remove the third text data from the plurality of sequences to generate a trimmed version of the plurality of sequences; apply the vectorization model to the trimmed version of the plurality of sequences to generate a vector version of the plurality of sequences; and apply the deep learning model to the vector version of the plurality of sequences to categorize each sequence from the plurality of sequences. 15. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: receiving a first document file in a machine-readable format, wherein the first document file includes one or more sequences of text; analyzing a sequence of text from the one or more sequences to identify a numeric text string in the sequence of text that forms an alphanumeric reference; generating a trimmed version of the sequence of text by: removing first text data from the sequence of text based on an amount of matches between the first text data and respective text data for a plurality of document files satisfying a match threshold, and removing one or more numeric text strings from the sequence of text; generating, based on the trimmed version of the sequence put into a vectorization model, a vector version of the trimmed version of the sequence of text; and generating, based on the vector version put into a deep learning model, a categorization of the sequence of text, wherein the deep learning model is pre-trained to categorize vector representations of text data into predefin

Assignees

Inventors

Classifications

G06V30/412
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
G06V30/418
Document matching, e.g. of document images · CPC title
G06F40/232
Orthographic correction, e.g. spell checking or vowelisation · CPC title
G06Q40/03
Credit; Loans; Processing thereof · CPC title
G06Q40/02
Banking, e.g. interest calculation or account maintenance (credit or loans G06Q40/03) · CPC title

Patent family

Related publications grouped by family.

View patent family 83458151

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12340182B2 cover?: Disclosed herein are system, method, and computer program product embodiments for categorizing sequences of text extracted from documents using natural language processing. In some embodiments, a categorization system may receive a first document file in a machine readable format. The categorization system may analyze a sequence of text from the first document file and identify a numeric text s…
Who is the assignee on this patent?: American Express Travel Related Services Co Inc, American Express India Private Ltd
What technology area does this patent fall under?: Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Pre-trained contextual embedding models for named entity recognition and confidence prediction

Neural network system for text classification

Ai-driven transaction management system

Classifying digital documents in multi-document transactions based on embedded dates

Frequently asked questions