Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, apparatus, device and storage medium for recognizing bill image

US11854246B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11854246-B2
Application number	US-202117201733-A
Country	US
Kind code	B2
Filing date	Mar 15, 2021
Priority date	Jun 9, 2020
Publication date	Dec 26, 2023
Grant date	Dec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for recognizing a bill image, the method comprising: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; for at least some of the text boxes, determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box, wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box based on the attribute information set and the relationship information set of the text box, comprises: determining an attribute feature set and a relationship feature set of the text box based on the attribute information set and the relationship information set of the text box; determining a type probability of the text box and a relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box; and determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes. 2. The method according to claim 1 , wherein, the type of the text box comprises a field attribute type, a field value type, a table header type, or a table cell type; text boxes of the field attribute type and the field value type have a field structural relationship; and text boxes of the table header type and the table cell type have a table structural relationship. 3. The method according to claim 1 , wherein, the determining the attribute feature set and the relationship feature set of the text box based on the attribute information set and the relationship information set of the text box, comprises: determining a visual feature of the text box based on an image area in the attribute information set of the text box; determining a semantic feature of the text box based on a text content in the attribute information set of the text box; using the visual feature, the semantic feature, and position coordinates in the attribute information set as the attribute feature set of the text box; and determining the relationship feature set of the text box based on the attribute feature set and the relationship information set. 4. The method according to claim 1 , wherein, the determining the type probability of the text box and the relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box, comprises: inputting the attribute feature set and the relationship feature set of the text box into a probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes. 5. The method according to claim 4 , wherein the probability prediction network comprises at least one sub-prediction network connected end to end; correspondingly, the inputting the attribute feature set and the relationship feature set of the text box into the probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes, comprises: inputting the relationship feature set of the text box into a first perceptron of a current sub-prediction network to obtain a current perception probability; inputting the current perception probability and the attribute feature set of the text box into a first hidden layer of the current sub-prediction network to obtain a first hidden text feature; and inputting the first hidden text feature and the attribute feature set into a long short-term memory network layer of the current sub-prediction network to obtain the type probability of the text box, in response to determining that the current sub-prediction network is a final sub-prediction network, and using the current perception probability as the relationship probability between different text boxes. 6. The method according to claim 5 , wherein after the inputting the current perception probability and the attribute feature set of the text box into the first hidden layer of the current sub-prediction network to obtain the first hidden text feature, the method further comprises: inputting the first hidden text feature and the attribute feature set into the long short-term memory network layer of the current sub-prediction network to obtain an updated attribute feature set of the text box, in response to determining that the current sub-prediction network is not the final sub-prediction network, and inputting the updated attribute feature set into a subsequent sub-prediction network; inputting the first hidden text feature and the relationship feature set into a second hidden layer of the current sub-prediction network to obtain a second hidden text feature; and inputting the second hidden text feature into a second perceptron of the current sub-prediction network to obtain an updated relationship feature set of the text box, and inputting the updated relationship feature set into a subsequent sub-prediction network. 7. The method according to claim 1 , wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes, comprises: determining the type of the text box based on the type probability of the text box; determining a candidate text box pair having the structural relationship, based on the relationship probability between different text boxes and a probability threshold; and determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box. 8. The method according to claim 7 , wherein after the determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box, the method further comprises: determining whether the text box is a preset type, in response to determining that at least two associated text boxes have the structural relationship with the text box; and in response to determining that the text box is the preset type, determining an associated text box having a highest relationship probability with the text box in the at least two associated text boxes as a final associated text box that has the structural relationship with the text box. 9. The method according to claim 1 , wherein, the attribute information set of the text box comprises position coordinates, an image area, and a text content of the text box; and the relationship information set of the text box comprises a position coordinate difference, a center point angle difference and a center point Euclidean distance between the text box and another text box. 10. The method according to claim 1 , wherein the performing text detection on the bill image, and determining the attribute information set and the relationship information set of each text box of at least two text boxes in the bill image, comprises: performing text detection on the bill image to obtain position coordinates of each text box of the at least two text boxes in the bill image; performing distortion correction on the position coordinates of

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title

Patent family

Related publications grouped by family.

View patent family 72539524

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854246B2 cover?: A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based o…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unsupervised domain adaptation from generic forms for new ocr forms

Data extraction and duplicate detection

Form image field extraction

Facilitating presentation of content relating to a financial transaction

Table recognizing method and table recognizing system

Frequently asked questions