Unsupervised domain adaptation from generic forms for new ocr forms
US-2020160086-A1 · May 21, 2020 · US
US11854246B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11854246-B2 |
| Application number | US-202117201733-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 15, 2021 |
| Priority date | Jun 9, 2020 |
| Publication date | Dec 26, 2023 |
| Grant date | Dec 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box.
Opening claim text (preview).
What is claimed is: 1. A method for recognizing a bill image, the method comprising: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; for at least some of the text boxes, determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box, wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box based on the attribute information set and the relationship information set of the text box, comprises: determining an attribute feature set and a relationship feature set of the text box based on the attribute information set and the relationship information set of the text box; determining a type probability of the text box and a relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box; and determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes. 2. The method according to claim 1 , wherein, the type of the text box comprises a field attribute type, a field value type, a table header type, or a table cell type; text boxes of the field attribute type and the field value type have a field structural relationship; and text boxes of the table header type and the table cell type have a table structural relationship. 3. The method according to claim 1 , wherein, the determining the attribute feature set and the relationship feature set of the text box based on the attribute information set and the relationship information set of the text box, comprises: determining a visual feature of the text box based on an image area in the attribute information set of the text box; determining a semantic feature of the text box based on a text content in the attribute information set of the text box; using the visual feature, the semantic feature, and position coordinates in the attribute information set as the attribute feature set of the text box; and determining the relationship feature set of the text box based on the attribute feature set and the relationship information set. 4. The method according to claim 1 , wherein, the determining the type probability of the text box and the relationship probability between different text boxes, based on the attribute feature set and the relationship feature set of the text box, comprises: inputting the attribute feature set and the relationship feature set of the text box into a probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes. 5. The method according to claim 4 , wherein the probability prediction network comprises at least one sub-prediction network connected end to end; correspondingly, the inputting the attribute feature set and the relationship feature set of the text box into the probability prediction network to obtain the type probability of the text box and the relationship probability between different text boxes, comprises: inputting the relationship feature set of the text box into a first perceptron of a current sub-prediction network to obtain a current perception probability; inputting the current perception probability and the attribute feature set of the text box into a first hidden layer of the current sub-prediction network to obtain a first hidden text feature; and inputting the first hidden text feature and the attribute feature set into a long short-term memory network layer of the current sub-prediction network to obtain the type probability of the text box, in response to determining that the current sub-prediction network is a final sub-prediction network, and using the current perception probability as the relationship probability between different text boxes. 6. The method according to claim 5 , wherein after the inputting the current perception probability and the attribute feature set of the text box into the first hidden layer of the current sub-prediction network to obtain the first hidden text feature, the method further comprises: inputting the first hidden text feature and the attribute feature set into the long short-term memory network layer of the current sub-prediction network to obtain an updated attribute feature set of the text box, in response to determining that the current sub-prediction network is not the final sub-prediction network, and inputting the updated attribute feature set into a subsequent sub-prediction network; inputting the first hidden text feature and the relationship feature set into a second hidden layer of the current sub-prediction network to obtain a second hidden text feature; and inputting the second hidden text feature into a second perceptron of the current sub-prediction network to obtain an updated relationship feature set of the text box, and inputting the updated relationship feature set into a subsequent sub-prediction network. 7. The method according to claim 1 , wherein the determining the type of the text box and the associated text box that has the structural relationship with the text box, based on the type probability of the text box and the relationship probability between different text boxes, comprises: determining the type of the text box based on the type probability of the text box; determining a candidate text box pair having the structural relationship, based on the relationship probability between different text boxes and a probability threshold; and determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box. 8. The method according to claim 7 , wherein after the determining the associated text box that has the structural relationship with the text box, based on the candidate text box pair and the type of the text box, the method further comprises: determining whether the text box is a preset type, in response to determining that at least two associated text boxes have the structural relationship with the text box; and in response to determining that the text box is the preset type, determining an associated text box having a highest relationship probability with the text box in the at least two associated text boxes as a final associated text box that has the structural relationship with the text box. 9. The method according to claim 1 , wherein, the attribute information set of the text box comprises position coordinates, an image area, and a text content of the text box; and the relationship information set of the text box comprises a position coordinate difference, a center point angle difference and a center point Euclidean distance between the text box and another text box. 10. The method according to claim 1 , wherein the performing text detection on the bill image, and determining the attribute information set and the relationship information set of each text box of at least two text boxes in the bill image, comprises: performing text detection on the bill image to obtain position coordinates of each text box of the at least two text boxes in the bill image; performing distortion correction on the position coordinates of
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.