Method and apparatus of image-to-document conversion based on ocr, device, and readable storage medium
US-2021256253-A1 · Aug 19, 2021 · US
US11756170B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11756170-B2 |
| Application number | US-202117151783-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2021 |
| Priority date | Jan 20, 2020 |
| Publication date | Sep 12, 2023 |
| Grant date | Sep 12, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure provide a method and apparatus for correcting a distorted document image, where the method for correcting a distorted document image includes: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; where the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted. By inputting the distorted document image to be corrected into the correction model, the corrected image corresponding to the distorted document image can be obtained through the correction model, which realizes document image correction end-to-end, improves accuracy of the document image correction, and extends application scenarios of the document image correction.
Opening claim text (preview).
What is claimed is: 1. A method for correcting a distorted document image, comprising: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; wherein the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted, wherein the correction model comprises a deformation parameter prediction module and a deformation correction module connected in series; wherein the deformation parameter prediction module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and the deformation correction module is a model obtained by training with the set of image samples and output results of the deformation parameter prediction module as inputs and the corrected image corresponding to each image sample in the set of image samples as an output; the inputting the distorted document image into the correction model, and obtaining the corrected image corresponding to the distorted document image comprises: inputting the distorted document image into the correction model, outputting an intermediate result through the deformation parameter prediction module, and obtaining, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module; the intermediate result comprising a deformation parameter of each pixel in the distorted document image; wherein the deformation parameter prediction module comprises at least two stages of deformation parameter prediction sub-modules connected in series; wherein a first-stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and another stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples and output results of a previous deformation parameter prediction sub-module as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output; the intermediate result is an output result of a last-stage deformation parameter prediction sub-module of the at least two stages of deformation parameter prediction sub-modules. 2. The method according to claim 1 , wherein the obtaining, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module comprises: obtaining an operating parameter, the operating parameter indicating a number of pixels on which correction operations are performed in parallel; obtaining, according to the operating parameter, multiple pixels in the distorted document image; and correcting, according to deformation parameters respectively corresponding to the multiple pixels, the multiple pixels in parallel through the deformation correction module, and obtaining multiple corrected pixels. 3. The method according to claim 1 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 4. The method according to claim 1 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 5. The method according to claim 2 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 6. The method according to claim 3 , wherein dilation ratios between the multiple dilation convolution operations comprised in the convolutional layer in the encoding unit gradually increase and are coprime. 7. The method according to claim 3 , wherein the U-shaped convolutional neural network model further comprises a parallel convolution unit between the encoding unit and the decoding unit, the parallel convolution unit is configured to perform multiple dilation convolution operations in parallel on a feature map outputted by a last layer of the convolutional layers in the encoding unit, and dilation ratios between the multiple dilation convolution operations performed in parallel are different. 8. The method according to claim 3 , wherein a convolutional layer in the decoding unit comprises a convolution operation and a recombination operation, the convolution operation is used for up-sampling a feature map, and the recombination operation is used for reconstructing the a number of rows, columns, and dimensions of a matrix for the up-sampled feature map. 9. An apparatus for correcting a distorted document image, comprising: a memory and a processor; wherein the memory is configured to store program instructions; and the processor is configured to call the program instructions stored in the memory to: obtain a distorted document image; and input the distorted document image into a correction model, and obtain a corrected image corresponding to the distorted document image; wherein the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted, wherein the correction model comprises a deformation parameter prediction module and a deformation correction module connected in series; wherein the deformation parameter prediction module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and the deformation correction module is a model obtained by training with the set of image samples and output results of the deformation parameter prediction module as inputs and the corrected image corresponding to each image sample in the set of image samples as an output; the processor is specifically configured to: input the distorted document image into the correction model, output an intermediate result through the deformation parameter prediction module, and obtain, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module; the intermediate result comprising a deformation parameter of each pixel in the distorted document image; wherein the deformation parameter prediction module comprises at least two stages of deformation parameter prediction sub-modules connected in series; wherein a first-stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and another stage deformation parame
Related publications grouped by family.
Answers are generated from the same data shown on this page.