Intelligent data extraction

US10402163B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402163-B2
Application numberUS-201715432039-A
CountryUS
Kind codeB2
Filing dateFeb 14, 2017
Priority dateFeb 14, 2017
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Electronically received data is validated based on a digital data image that is scanned from a paper document. Known paper document source entities, paper document types and associated paper document configuration information are stored in a database. The paper documents are converted to digital data images and optically processed to identify respective source entity and document type information represented within the digital data images. Appropriate document configuration information is retrieved based on association with the detected type of document. Validation target data is extracted from the digital data images based on the configuration information and processed. The electronically received data is validated based on the extracted and processed validation target data.

First claim

Opening claim text (preview).

We claim: 1. A method for validating electronic data, the method comprising: storing in a database, data representative of: a plurality of entities; one or more document types, wherein each of the one or more document types is associated one or more of the plurality of entities; and one or more configurations, wherein each of the one or more configurations comprises configuration information associated with one or more of the document types; receiving an electronic data validation request, the request comprising electronic data input by a user comprising customer specific data to be validated by comparison with data extracted from paper documents provided by the customer; receiving digital data images representing electronically scanned paper documents, associated with the electronic data validation request, said paper documents being provided by the customer; optically processing each respective digital data image to convert the image into machine readable text, identifying from the machine readable text an entity corresponding to one of the stored plurality of entities that is represented within the respective digital data image, and determining a document type corresponding to one of the one or more document types from the respective digital data image; retrieving configuration information corresponding to each respective digital data image having an identified entity and determined document type based on the identified entity and the determined document type of the respective digital data image; extracting target data from each respective digital data image for which configuration information is retrieved, wherein the extracting is based on the configuration information corresponding to each respective digital data image; validating the electronic data input by the user by comparing it with the target data extracted from each respective digital data image of the one or more scanned paper documents associated with the electronic data validation request and determining whether the electronic data and target data match; and transmitting a response to the electronic data validation request indicating the result of the electronic data validation. 2. The method for validating electronic data of claim 1 , wherein the entity represented within the digital data image indicates a source of a corresponding paper document. 3. The method for validating electronic data of claim 1 , wherein the configuration information corresponding to an entity and document type indicate types of data, data or data field locations and data formatting information for a digital data image corresponding to the entity and document type. 4. The method for validating electronic data of claim 1 , wherein the configuration information is stored in the database as Javascript object notation (JSON) and indicates from which fields of a corresponding respective digital data image to extract the validation target data. 5. The method for validating electronic data of claim 1 , further comprising storing the extracted target data in JSON format. 6. The method for validating electronic data of claim 1 , wherein the entity of the plurality of entities is identified utilizing natural language processing and data extraction techniques. 7. The method for validating electronic data of claim 1 , wherein the document type of the one or more document types is detected from the respective digital data image utilizing fuzzy matching techniques or approximate string matching techniques. 8. The method for validating electronic data of claim 1 , wherein the target data is searched and extracted from each respective digital data image for which configuration information is retrieved based on one or more of artificial intelligent techniques, behavior tree searching, fuzzy matching, AhoCorasick algorithm and pattern recognition techniques. 9. The method for validating electronic data of claim 1 , further comprising: creating a new self-learned configuration in instances when the configuration information is not available; searching within a digital image data document corresponding to the detected document type for numerical expressions that represent various known types of target data based on known target data formatting conventions; searching for column headers that correspond to the numerical expressions; searching the column headers and the numerical expressions for keywords and symbols that characterize the data of each column; storing in the database, the new self-learned configuration, locations and formatting of found types of target data, the column headers, and the data characteristics; and associating the new self-learned configuration with a detected document type. 10. A system for validating electronic data, the system comprising: a computer processor; a memory that stores instructions, wherein the instructions when executed by the computer processor cause the computer processor to: store in a database: data representative of: a plurality of entities; one or more document types, wherein each of the one or more document types is associated one or more of the plurality of entities; and one or more configurations, wherein each of the one or more configurations comprises configuration information associated with one or more of the document types; receive an electronic data validation request, the request comprising electronic data input by a user comprising customer specific data to be validated by comparison with data extracted from paper documents provided by the customer; receive digital data images representing electronically scanned paper documents associated with the electronic data validation request, said paper documents being provided by the customer; optically process each respective digital data image to convert the image into machine readable text, identify from the machine readable text an entity corresponding to one of the stored plurality of entities that is represented within the respective digital data image, and determine a document type corresponding to one of the one or more document types from the respective digital data image; retrieve configuration information corresponding to each respective digital data image having an identified entity and determined document type based on the identified entity and the determined document type of the respective digital data image; extract target data from each respective digital data image for which configuration information is retrieved, wherein the extraction is based on the configuration information corresponding to each respective digital data image; and validate the electronic data input by the user by comparing it with the target data extracted from each respective digital data image of the one or more scanned paper documents associated with the electronic data validation request and determining whether the electronic data and target data match; and transmit a response to the electronic data validation request indicating the result of the electronic data validation. 11. The system for validating electronic data of claim 10 , wherein the entity represented within the digital data image indicates a source of a corresponding paper document. 12. The system for validating electronic data of claim 10 , wherein the configuration information corresponding to an entity and document type indicate types of data, data or data field locations and data formatting information for a digital data image corresponding to the entity and document type. 13. The system for validating electronic data of claim 10 , wherein the configuration information is stored in the database as Javascript object notation (JSON) and indicates from which fiel

Assignees

Inventors

Classifications

  • Indexing; Data structures therefor; Storage structures · CPC title

  • using natural language analysis · CPC title

  • G06F7/02Primary

    Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402163B2 cover?
Electronically received data is validated based on a digital data image that is scanned from a paper document. Known paper document source entities, paper document types and associated paper document configuration information are stored in a database. The paper documents are converted to digital data images and optically processed to identify respective source entity and document type informati…
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).