Who is the assignee on this patent?

Canon Information & Imaging Solutions Inc, Canon Usa Inc

What technology area does this patent fall under?

Primary CPC classification G06V30/412. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for extracting data from a non-structured document

US10740372B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10740372-B2
Application number	US-201615085781-A
Country	US
Kind code	B2
Filing date	Mar 30, 2016
Priority date	Apr 2, 2015
Publication date	Aug 11, 2020
Grant date	Aug 11, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data object representing an electronic document having a plurality of data items each having at least one data value associated therewith is loaded from memory. The data object is searched for plurality of data items by keyword search for at least one candidate target data item. A target data item is selected by identifying at least one ancillary data item known to be located within the electronic document proximate to the at least one candidate. A target field within the electronic document is generated to encapsulate the at least one data value associated with the selected target data item. A format of the at least one data value is compared with a predetermined data value format and extracted from the target field in response to the format of the at least one data value matching the predetermined data value format for storage in a table of a database.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of extracting data from an electronic document comprising: loading, from a memory, a data object representing an electronic document having a plurality of data items each having at least one data value associated therewith; searching the plurality of data items in the electronic document by keyword search for at least one candidate target data items; selecting a target data item from the at least one candidate target data items by identifying at least one ancillary data item known to be located within the electronic document proximate to the at least one candidate target data items; generating a target field within the electronic document to encapsulate the at least one data value associated with the selected target data item, the target field having a predetermined height substantially equal to a height of the target data item, the target field extending horizontally within the electronic document in a direction away from the target data item and extending to a predetermined position in the electronic document; comparing a format of the at least one data value with a predetermined data value format; and extracting the at least one data value from the target field in response to the format of the at least one data value matching the predetermined data value format for storage in a table of a database. 2. The method according to claim 1 , further comprising generating a compound data item field around the candidate data item and the at least one ancillary data item; comparing a format of the compound data item field with a predetermined format; and selecting, as the target data item, the candidate target data item when the format of the compound data item field matches the predetermined format. 3. The method according to claim 1 , further comprising in response to identifying a plurality of candidate data items, generating a data field around each of the candidate data items; extending the data field to form a compound data item field in a direction away from each candidate data item to identify the at least one ancillary data item; and selecting, as the target data item, the candidate data item within the compound data item field including a predetermined first ancillary data item within a predetermined distance from the respective candidate data item. 4. The method according to claim 3 , wherein in response to determining that more than one candidate data item is within a predetermined distance to the first ancillary data item, extending the compound data item field further in one of a same direction or different direction to identify at least one further ancillary data item within a predetermined distance from each of the candidate data item and the first ancillary data item; and selecting, as the target data item, the candidate data item from within the compound data item field that is a predetermined distance from each of the first ancillary data item and at least one further ancillary data item. 5. The method according to claim 1 , wherein the step of generating a target field further comprises creating the target field having a predetermined height substantially equal to a height of the target data item, the target field beginning at a position in the electronic document a predetermined distance from a margin thereof and aligned with the selected target data item, the target field begin sequentially extending horizontally within the electronic document in a direction towards the target data item. 6. The method according to claim 1 , further comprising receiving electronic document data; and performing an optical character recognition process on the electronic document data to create the data object. 7. A server apparatus that extracts data from an electronic document, the server comprising: a controller; a memory coupled to the controller storing instructions that, when executed by the controller control the server to load, from a memory, a data object representing an electronic document having a plurality of data items each having at least one data value associated therewith; search the plurality of data items in the electronic document by keyword search for at least one candidate target data items; select a target data item from the at least one candidate target data items by identifying at least one ancillary data item known to be located within the electronic document proximate to the at least one candidate target data items; generate a target field within the electronic document to encapsulate the at least one data value associated with the selected target data item, the target field having a predetermined height substantially equal to a height of the target data item, the target field extending horizontally within the electronic document in a direction away from the target data item and extending to a predetermined position in the electronic document; compare a format of the at least one data value with a predetermined data value format; and extract the at least one data value from the target field in response to the format of the at least one data value matching the predetermined data value format for storage in a table of a database. 8. The server apparatus according to claim 7 , wherein execution of the instructions causes the server apparatus to generate a compound data item field around the candidate target data item and the at least one ancillary data item; compare a format of the compound data item field with a predetermined format; and select, as the target data item, the candidate data item when the format of the compound data item field matches the predetermined format. 9. The server apparatus according to claim 7 , wherein execution of the instructions causes the server apparatus to in response to identifying a plurality of candidate data items, generate a data field around each of the candidate data items; extend the data field to form a compound data item field in a direction away from each candidate data item to identify the at least one ancillary data item; and select, as the target data item, the candidate data item within the compound data item field including a predetermined first ancillary data item within a predetermined distance from the respective candidate data item. 10. The server apparatus according to claim 9 , wherein execution of the instructions causes the server apparatus to in response to determining that more than one candidate data item is within a predetermined distance to the first ancillary data item, extend the compound data item field further in one of a same direction or different direction to identify at least one further ancillary data item within a predetermined distance from each of the candidate data item and the first ancillary data item; and select, as the target data item, the candidate data item from within the compound data item field that is a predetermined distance from each of the first ancillary data item and at least one further ancillary data item. 11. The server apparatus according to claim 9 , wherein execution of the instructions causes the server apparatus to receive electronic document data; and perform an optical character recognition process on the electronic document data to create the data object. 12. The server apparatus according to claim 7 , wherein generation of the a target field further includes creating the target field having a predetermined height substantially equal to a height of the target data item, the target field beginning at a position in the electronic document a predetermined distance from a margin thereof and aligned with the selected target data item, the target field begin sequentially extend

Assignees

Inventors

Matsumoto Hideaki

Classifications

G06V30/412Primary
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
G06V30/414
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
G06V30/224
of printed characters having additional code marks or containing code marks · CPC title
H04W12/06
Authentication · CPC title
G06F16/93
Document management systems · CPC title

Patent family

Related publications grouped by family.

View patent family 57015935

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10740372B2 cover?: A data object representing an electronic document having a plurality of data items each having at least one data value associated therewith is loaded from memory. The data object is searched for plurality of data items by keyword search for at least one candidate target data item. A target data item is selected by identifying at least one ancillary data item known to be located within the elect…
Who is the assignee on this patent?: Canon Information & Imaging Solutions Inc, Canon Usa Inc
What technology area does this patent fall under?: Primary CPC classification G06V30/412. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Feedback validation of electronically generated forms

Systems, methods and computer program products for determining document validity

Document processing

Frequently asked questions