What technology area does this patent fall under?

Primary CPC classification G06V30/1916. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic key/value pair extraction from document images using deep learning

US10896357B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10896357-B1
Application number	US-201715858976-A
Country	US
Kind code	B1
Filing date	Dec 29, 2017
Priority date	Dec 29, 2017
Publication date	Jan 19, 2021
Grant date	Jan 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Key/Value pairs, each comprising a keyword string and an associated value, are extracted automatically from a document image. Each document image has a plurality of pixels with each pixel having a plurality of bits. A first subset of the plurality of bits for each pixel represents information corresponding to the document image. The document image is processed to add information to a second subset of the plurality of bits for each pixel. The information added to the second subset alters the appearance of the document image in a manner that facilitates semantic recognition of textually encoded segments within the document image by a Deep Neural Network (DNN) trained to recognize images within image documents. The DNN detects groupings of text segments within detected spatial templates within the document image. The text segments are mapped to known string values to generate the keyword strings and associated values.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method for identifying keyword strings and associated values from a document image, comprising: receiving the document image from a document storage, wherein the document image comprises a plurality of pixels and wherein each pixel within the document image is represented by a plurality of bits contained in a computer system storage; modifying the document image, from a first version of the document image to a second version of the document image; allocating for each pixel of the plurality of pixels, a first subset of bits representing information corresponding to the second version of the document image; and allocating for each pixel, a second subset of bits wherein the second subset of bits does not overlap with the first subset of bits, and setting the value of each bit within the second subset of bits for each pixel, to represent the second version of the document image in a manner selected to facilitate recognition of textually encoded segments within the document image by a deep neural network trained to recognize objects within an image; detecting by the deep neural network, groupings of text segments in the document image, wherein each grouping of text segments in the document image is associated with a spatial template; and mapping the text segments in the groupings of text segments in the document image to known string values to identify the keyword strings and associated values, wherein the keyword string is representative of semantic meaning of a grouping of text segments. 2. The computerized method of claim 1 wherein the known string values correspond to a known domain associated with the document images. 3. The computerized method of claim 1 wherein modifying the document image from a first version of the document image to a second version of the document image further comprises: processing the document image to recognize textually encoded segments; and annotating the textually encoded segments with a probability value indicative of probability of the text segments representing a known keyword string; and wherein setting the value of each bit within the second subset of bits for each pixel corresponds to the annotated probability value indicative of the probability of the text segments representing a known keyword string. 4. The computerized method of claim 1 wherein processing the document image to modify the bits for each pixel, to allocate for each pixel, a second subset of bits wherein the second subset of bits does not overlap with the first subset of bits, and to add information to the second subset of bits for each pixel, to alter the appearance of the document image in a manner selected to facilitate semantic recognition of textually encoded segments within the document image by a deep neural network trained to recognize objects within an image comprises: blurring the document image by joining neighboring characters in a line and across successive lines in the document image, and converting the document image to mimic a natural image. 5. The computerized method of claim 1 wherein processing the document image to modify the bits for each pixel, to allocate for each pixel, a second subset of bits wherein the second subset of bits does not overlap with the first subset of bits, and to add information to the second subset of bits for each pixel, to alter the appearance of the document image in a manner selected to facilitate semantic recognition of textually encoded segments within the document image by a deep neural network trained to recognize objects within an image comprises: removing backgrounds, patterns, and lines from the document image to generate a noise free, uniform font rendering of the document image. 6. The computerized method of claim 1 wherein processing the document image to modify the bits for each pixel, to allocate for each pixel, a second subset of bits wherein the second subset of bits does not overlap with the first subset of bits, and to add information to the second subset of bits for each pixel, to alter the appearance of the document image in a manner selected to facilitate semantic recognition of textually encoded segments within the document image by a deep neural network trained to recognize objects within an image comprises: adding regular gaussian distributed noise to the document image. 7. The computerized method of claim 1 wherein processing the document image to modify the bits for each pixel, to allocate for each pixel, a second subset of bits wherein the second subset of bits does not overlap with the first subset of bits, and to add information to the second subset of bits for each pixel, to alter the appearance of the document image in a manner selected to facilitate semantic recognition of textually encoded segments within the document image by a deep neural network trained to recognize objects within an image comprises: processing the document image to recognize textually encoded segments; and processing the textually encoded segments in accordance with a list of keyword strings, each of which has associated therewith an occurrence frequency indicative of occurrence frequency of the keyword string within a domain associated with the document image. 8. The computerized method of claim 1 further comprising: splitting the document image into a plurality of overlapping sub-images after processing the document image to add information to a second subset of the plurality of bits for each pixel; and wherein detecting by the deep neural network, groupings of text segments within detected spatial templates within the document image, is performed separately, for each of the sub-images; and wherein the detected groupings of text segments and associated spatial templates in each of the sub-images are joined before mapping the text segments to known string values to generate the keyword strings and associated values. 9. The computerized method of claim 1 wherein the deep neural network detects a plurality of spatial templates for certain of the groupings of text segments, the method further comprising merging the spatial templates for each grouping of text segments to generate a single merged spatial template for each grouping of text segments. 10. The computerized method of claim 9 wherein merging the spatial templates is performed in accordance with a non-maximum suppression algorithm. 11. The computerized method of claim 9 further comprising removing spatial templates characterized by low-confidence. 12. The computerized method of claim 1 wherein mapping the text segments in the groupings of text segments to known string values to generate the keyword strings and associated values, comprises accessing a mapping of known key string values to a semantic key value to associate each keyword string with a value associated with the keyword string. 13. The computerized method of claim 12 further comprising receiving user selection of semantic key values. 14. The computerized method of claim 1 wherein the first subset of bits for each pixel comprises a single bit in the plurality of bits for a pixel and wherein the second subset of pixels for each pixel comprises any additional bits in the plurality of bits for the pixel. 15. A document processing system comprising: data storage for storing a plurality of document images, wherein the document image comprises a plurality of pixels and wherein each pixel within the document image is comprised of a plurality of bits; and a processor operatively coupled to the data storage and configured to execute instructions that when executed cause the processor to: process th

Assignees

Automation Anywhere Inc

Inventors

Classifications

G06V30/1916Primary
Validation; Performance evaluation · CPC title
G06V30/274
Syntactic or semantic context, e.g. balancing · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title
G06V30/10
Character recognition · CPC title
G06V30/414
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title

Patent family

Related publications grouped by family.

View patent family 74180567

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10896357B1 cover?: Key/Value pairs, each comprising a keyword string and an associated value, are extracted automatically from a document image. Each document image has a plurality of pixels with each pixel having a plurality of bits. A first subset of the plurality of bits for each pixel represents information corresponding to the document image. The document image is processed to add information to a second sub…
Who is the assignee on this patent?: Automation Anywhere Inc
What technology area does this patent fall under?: Primary CPC classification G06V30/1916. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).