What technology area does this patent fall under?

Primary CPC classification G06F40/186. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for data extraction from electronic documents using data patterns

US11625419B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11625419-B2
Application number	US-202017064150-A
Country	US
Kind code	B2
Filing date	Oct 6, 2020
Priority date	Oct 6, 2020
Publication date	Apr 11, 2023
Grant date	Apr 11, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for extracting data from electronic documents based on data patterns. The method includes receiving electronic template documents. Each template document corresponds to a type of electronic document. The method further includes, for each template document, processing the template document using a text extraction and data processing application. The method also includes, for each template document, determining a data extraction formula corresponding to the type of electronic document. The method further includes, storing the data extraction formula in a first database. The method also includes, receiving an electronic document including user data and a Unicode corresponding to the type of document. The method also includes, processing and classifying the electronic document into the type of document corresponding to the Unicode. The method also includes identifying data elements in the electronic document based on the data extraction formula and extracting data values for each of the identified data elements.

First claim

Opening claim text (preview).

What is claimed: 1. A computerized method for extracting data from electronic documents based on a plurality of data patterns, the method comprising: receiving, by a server computing device, a plurality of electronic template documents, wherein each electronic template document corresponds to a type of electronic document; for each of the plurality of electronic template documents, processing, by the server computing device, the electronic template document using a text extraction and data processing application; for each of the plurality of electronic template documents, determining, by the server computing device, a data extraction formula corresponding to the type of electronic document; storing, by the server computing device, the data extraction formula for each of the plurality of electronic template documents in a first database; receiving, by the server computing device, an electronic document comprising user data and a Unicode corresponding to the type of electronic document; processing, by the server computing device, the electronic document using the text extraction and data processing application; classifying, by the server computing device, the electronic document into the type of electronic document corresponding to the Unicode; identifying, by the server computing device, data elements in the electronic document based on the data extraction formula corresponding to the type of electronic document; extracting, by the server computing device, data values for each of the identified data elements in the electronic document; and generating, by the server computing device, a second database comprising the data values for each of the identified data elements in the electronic document and locations of the identified data elements. 2. The computerized method of claim 1 , wherein processing the electronic template document comprises: identifying, by the server computing device, a header and a footer based on a similarity score; and removing, by the server computing device, the header and footer from the electronic template document. 3. The computerized method of claim 1 , wherein the data extraction formula corresponds to locations of data elements in the electronic template document. 4. The computerized method of claim 1 , wherein processing the electronic document comprises: identifying, by the server computing device, a header and a footer based on a similarity score; and removing, by the server computing device, the header and footer from the electronic document. 5. The computerized method of claim 1 , wherein classifying the electronic document into the type of electronic document is further based on an organization corresponding to the type of electronic document. 6. The computerized method of claim 1 , wherein identifying the data elements in the electronic document further comprises: calculating, by the server computing device, a cosine similarity score based on the electronic document and the electronic template document corresponding to the document type; and benchmarking, by the server computing device, the cosine similarity scores. 7. The computerized method of claim 1 , wherein the locations of the identified data elements correspond to a page number of the electronic document. 8. The computerized method of claim 1 , wherein the server computing device is further configured to receive the plurality of electronic template documents from a plurality of data sources. 9. A system for extracting data from electronic documents based on a plurality of data patterns, the system comprising: a server computing device communicatively coupled to a first database and a second database over a network, the server computing device configured to: receive a plurality of electronic template documents, wherein each electronic template document corresponds to a type of electronic document; for each of the plurality of electronic template documents, process the electronic template document using a text extraction and data processing application; for each of the plurality of electronic template documents, determine a data extraction formula corresponding to the type of electronic document; store the data extraction formula for each of the plurality of electronic template documents in the first database; receive an electronic document comprising user data and a Unicode corresponding to the type of electronic document; process the electronic document using the text extraction and data processing application; classify the electronic document into the type of electronic document corresponding to the Unicode; identify data elements in the electronic document based on the data extraction formula corresponding to the type of electronic document; extract data values for each of the identified data elements in the electronic document; and generate the second database comprising the data values for each of the identified data elements in the electronic document and locations of the identified data elements. 10. The system of claim 9 , wherein the server computing device is further configured to process the electronic template document by: identifying a header and a footer based on a similarity score; and removing the header and footer from the electronic template document. 11. The system of claim 9 , wherein the data extraction formula corresponds to locations of data elements in the electronic template document. 12. The system of claim 9 , wherein the server computing device is further configured to process the electronic document by: identifying a header and a footer based on a similarity score; and removing the header and footer from the electronic document. 13. The system of claim 9 , wherein classifying the electronic document into the type of electronic document is further based on an organization corresponding to the type of electronic document. 14. The system of claim 9 , wherein the server computing device is further configured to identify the data elements in the electronic document by: calculating a cosine similarity score based on the electronic document and the electronic template document corresponding to the document type; and benchmarking the cosine similarity scores. 15. The system of claim 9 , wherein the locations of the identified data elements correspond to a page number of the electronic document. 16. The system of claim 9 , wherein the server computing device is further configured to receive the plurality of electronic template documents from a plurality of data sources. 17. A computerized method for extracting data from electronic documents based on a plurality of data patterns, the method comprising: receiving, by the server computing device, an electronic document comprising user data and a Unicode corresponding to a type of electronic document; identifying, by the server computing device, a header and a footer of the electronic document based on a similarity score; removing, by the server computing device, the identified header and footer from the electronic document based on the similarity score; classifying, by the server computing device, the electronic document into the type of electronic document corresponding to the Unicode; identifying, by the server computing device, data elements in the electronic document based on a data extraction formula corresponding to the type of electronic document; extracting, by the server computing device, data values for each of the identified data elements in the electronic document; and generate, by the server computing device, a database comprising the data values for

Assignees

Fmr Llc

Inventors

Classifications

G06F40/186Primary
Templates · CPC title
G06F16/93
Document management systems · CPC title
G06F16/355
Creation or modification of classes or clusters · CPC title
G06V30/416
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
G06F16/285Primary
Clustering or classification · CPC title

Patent family

Related publications grouped by family.

View patent family 80931391

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11625419B2 cover?: Systems and methods for extracting data from electronic documents based on data patterns. The method includes receiving electronic template documents. Each template document corresponds to a type of electronic document. The method further includes, for each template document, processing the template document using a text extraction and data processing application. The method also includes, for …
Who is the assignee on this patent?: Fmr Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/186. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).