What technology area does this patent fall under?

Primary CPC classification G06V30/18152. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Detecting fields in document images

US12354397B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12354397-B2
Application number	US-202318502343-A
Country	US
Kind code	B2
Filing date	Nov 6, 2023
Priority date	Jul 21, 2021
Publication date	Jul 8, 2025
Grant date	Jul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of detecting fields in document images includes: receiving a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors; calculating, based on a set of user labeled document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified labeled field with respect to the visual word; loading a document image for extraction of target fields; calculating a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detecting, using the trained model, fields in the document image based on the calculated statistical predicate.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, by a processing device, a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors, wherein each local descriptor is associated with a keypoint region of a first set of document images; calculating, based on a second set of document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified field with respect to the visual word; loading a document image for extraction of target fields; calculating a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detecting fields in the document image based on the calculated statistical predicate. 2. The method of claim 1 , wherein the codebook is optimized on a third set of document images. 3. The method of claim 1 , wherein calculating the respective frequency distribution comprises calculating an integral two-dimensional histogram of shift of a position of the specified field, and wherein the integral two-dimensional histogram incorporates a plurality of shifts relative to possible positions of each visual word. 4. The method of claim 1 , wherein calculating the statistical predicate further comprises: obtaining an accumulated distribution histogram based on possible positions of the target field with respect to two or more visual words of the set of visual words. 5. The method of claim 1 , wherein a plurality of document images of the second set of document images have a similar layout. 6. The method of claim 1 , further comprising: dividing the second set of document images into groups based on document similarity prior to at least one of: training a model or using the model. 7. The method of claim 1 , wherein the statistical predicate is represented by a linear combination of individual predicates corresponding to a plurality of visual words detected in the document image. 8. A system, comprising: a memory; and a processing device coupled to the memory, the processing device configured to: receive a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors, wherein each local descriptor is associated with a keypoint region of a first set of document images; calculate, based on a second set of document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified field with respect to the visual word; load a document image for extraction of target fields; calculate a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detect fields in the document image based on the calculated statistical predicate. 9. The system of claim 8 , wherein the codebook is optimized on a third set of document images. 10. The system of claim 8 , wherein calculating the respective frequency distribution comprises calculating an integral two-dimensional histogram of shift of a position of the specified field, and wherein the integral two-dimensional histogram incorporates a plurality of shifts relative to possible positions of each visual word. 11. The system of claim 8 , wherein calculating the statistical predicate further comprises: obtaining an accumulated distribution histogram based on possible positions of the target field with respect to two or more visual words of the set of visual words. 12. The system of claim 8 , wherein a plurality of document images of the second set of document images have a similar layout. 13. The system of claim 8 , wherein the processing device is further configured to: dividing the second set of document images into groups based on document similarity prior to at least one of: training a model or using the model. 14. The system of claim 8 , wherein the statistical predicate is represented by a linear combination of individual predicates corresponding to a plurality of visual words detected in the document image. 15. A non-transitory computer-readable storage medium comprising executable instructions that, when executed by a processing device, cause the processing device to: receive a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors, wherein each local descriptor is associated with a keypoint region of a first set of document images; calculate, based on a second set of document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified field with respect to the visual word; load a document image for extraction of target fields; calculate a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detect fields in the document image based on the calculated statistical predicate. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the codebook is optimized on a third set of document images. 17. The non-transitory computer-readable storage medium of claim 15 , wherein calculating the respective frequency distribution comprises calculating an integral two-dimensional histogram of shift of a position of the specified field, and wherein the integral two-dimensional histogram incorporates a plurality of shifts relative to possible positions of each visual word. 18. The non-transitory computer-readable storage medium of claim 15 , wherein calculating the statistical predicate further comprises: obtaining an accumulated distribution histogram based on possible positions of the target field with respect to two or more visual words of the set of visual words. 19. The non-transitory computer-readable storage medium of claim 15 , wherein a plurality of document images of the second set of document images have a similar layout. 20. The non-transitory computer-readable storage medium of claim 15 , further comprising: dividing the second set of document images into groups based on document similarity prior to at least one of: training a model or using the model.

Assignees

Abbyy Dev Inc

Inventors

Classifications

G06V30/18152Primary
Extracting features based on a plurality of salient regional features, e.g. "bag of words" · CPC title
G06V10/462
Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title
G06V30/18143
Extracting features based on salient regional features, e.g. scale invariant feature transform [SIFT] keypoints · CPC title
G06F18/2163
Partitioning the feature space · CPC title
G06F18/22
Matching criteria, e.g. proximity measures · CPC title

Patent family

Related publications grouped by family.

View patent family 84976107

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354397B2 cover?: A method of detecting fields in document images includes: receiving a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors; calculating, based on a set of user labeled document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified labeled field with respect to the vi…
Who is the assignee on this patent?: Abbyy Dev Inc
What technology area does this patent fall under?: Primary CPC classification G06V30/18152. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Three-dimensional shape expression method and device thereof

Global visual vocabulary, systems and methods

Holistic document search

Image processing and object classification

Frequently asked questions