Method, apparatus, device and storage medium for recognizing bill image

US11854246B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11854246-B2
Application numberUS-202117201733-A
CountryUS
Kind codeB2
Filing dateMar 15, 2021
Priority dateJun 9, 2020
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for recognizing a bill image, the method comprising: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; for at least some of the text boxes, determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box, wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box based on the attribute information set and the relationship information set of the text box, comprises: determining an attribute feature set and a relationship feature set of the text box based on the attribute information set and the relationship information set of the text box; determining a type probability of the text box and a relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box; and determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes. 2. The method according to claim 1 , wherein, the type of the text box comprises a field attribute type, a field value type, a table header type, or a table cell type; text boxes of the field attribute type and the field value type have a field structural relationship; and text boxes of the table header type and the table cell type have a table structural relationship. 3. The method according to claim 1 , wherein, the determining the attribute feature set and the relationship feature set of the text box based on the attribute information set and the relationship information set of the text box, comprises: determining a visual feature of the text box based on an image area in the attribute information set of the text box; determining a semantic feature of the text box based on a text content in the attribute information set of the text box; using the visual feature, the semantic feature, and position coordinates in the attribute information set as the attribute feature set of the text box; and determining the relationship feature set of the text box based on the attribute feature set and the relationship information set. 4. The method according to claim 1 , wherein, the determining the type probability of the text box and the relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box, comprises: inputting the attribute feature set and the relationship feature set of the text box into a probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes. 5. The method according to claim 4 , wherein the probability prediction network comprises at least one sub-prediction network connected end to end; correspondingly, the inputting the attribute feature set and the relationship feature set of the text box into the probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes, comprises: inputting the relationship feature set of the text box into a first perceptron of a current sub-prediction network to obtain a current perception probability; inputting the current perception probability and the attribute feature set of the text box into a first hidden layer of the current sub-prediction network to obtain a first hidden text feature; and inputting the first hidden text feature and the attribute feature set into a long short-term memory network layer of the current sub-prediction network to obtain the type probability of the text box, in response to determining that the current sub-prediction network is a final sub-prediction network, and using the current perception probability as the relationship probability between different text boxes. 6. The method according to claim 5 , wherein after the inputting the current perception probability and the attribute feature set of the text box into the first hidden layer of the current sub-prediction network to obtain the first hidden text feature, the method further comprises: inputting the first hidden text feature and the attribute feature set into the long short-term memory network layer of the current sub-prediction network to obtain an updated attribute feature set of the text box, in response to determining that the current sub-prediction network is not the final sub-prediction network, and inputting the updated attribute feature set into a subsequent sub-prediction network; inputting the first hidden text feature and the relationship feature set into a second hidden layer of the current sub-prediction network to obtain a second hidden text feature; and inputting the second hidden text feature into a second perceptron of the current sub-prediction network to obtain an updated relationship feature set of the text box, and inputting the updated relationship feature set into a subsequent sub-prediction network. 7. The method according to claim 1 , wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes, comprises: determining the type of the text box based on the type probability of the text box; determining a candidate text box pair having the structural relationship, based on the relationship probability between different text boxes and a probability threshold; and determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box. 8. The method according to claim 7 , wherein after the determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box, the method further comprises: determining whether the text box is a preset type, in response to determining that at least two associated text boxes have the structural relationship with the text box; and in response to determining that the text box is the preset type, determining an associated text box having a highest relationship probability with the text box in the at least two associated text boxes as a final associated text box that has the structural relationship with the text box. 9. The method according to claim 1 , wherein, the attribute information set of the text box comprises position coordinates, an image area, and a text content of the text box; and the relationship information set of the text box comprises a position coordinate difference, a center point angle difference and a center point Euclidean distance between the text box and another text box. 10. The method according to claim 1 , wherein the performing text detection on the bill image, and determining the attribute information set and the relationship information set of each text box of at least two text boxes in the bill image, comprises: performing text detection on the bill image to obtain position coordinates of each text box of the at least two text boxes in the bill image; performing distortion correction on the position coordinates of

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Artificial neural networks [ANN] · CPC title

  • Training; Learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854246B2 cover?
A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based o…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).