Method and apparatus for correcting distorted document image

US11756170B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11756170-B2
Application numberUS-202117151783-A
CountryUS
Kind codeB2
Filing dateJan 19, 2021
Priority dateJan 20, 2020
Publication dateSep 12, 2023
Grant dateSep 12, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a method and apparatus for correcting a distorted document image, where the method for correcting a distorted document image includes: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; where the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted. By inputting the distorted document image to be corrected into the correction model, the corrected image corresponding to the distorted document image can be obtained through the correction model, which realizes document image correction end-to-end, improves accuracy of the document image correction, and extends application scenarios of the document image correction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for correcting a distorted document image, comprising: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; wherein the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted, wherein the correction model comprises a deformation parameter prediction module and a deformation correction module connected in series; wherein the deformation parameter prediction module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and the deformation correction module is a model obtained by training with the set of image samples and output results of the deformation parameter prediction module as inputs and the corrected image corresponding to each image sample in the set of image samples as an output; the inputting the distorted document image into the correction model, and obtaining the corrected image corresponding to the distorted document image comprises: inputting the distorted document image into the correction model, outputting an intermediate result through the deformation parameter prediction module, and obtaining, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module; the intermediate result comprising a deformation parameter of each pixel in the distorted document image; wherein the deformation parameter prediction module comprises at least two stages of deformation parameter prediction sub-modules connected in series; wherein a first-stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and another stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples and output results of a previous deformation parameter prediction sub-module as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output; the intermediate result is an output result of a last-stage deformation parameter prediction sub-module of the at least two stages of deformation parameter prediction sub-modules. 2. The method according to claim 1 , wherein the obtaining, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module comprises: obtaining an operating parameter, the operating parameter indicating a number of pixels on which correction operations are performed in parallel; obtaining, according to the operating parameter, multiple pixels in the distorted document image; and correcting, according to deformation parameters respectively corresponding to the multiple pixels, the multiple pixels in parallel through the deformation correction module, and obtaining multiple corrected pixels. 3. The method according to claim 1 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 4. The method according to claim 1 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 5. The method according to claim 2 , wherein the U-shaped convolutional neural network model comprises an encoding unit and a decoding unit, the encoding unit and the decoding unit each comprise multiple convolutional layers, and a convolutional layer in the encoding unit comprises multiple dilation convolution operations. 6. The method according to claim 3 , wherein dilation ratios between the multiple dilation convolution operations comprised in the convolutional layer in the encoding unit gradually increase and are coprime. 7. The method according to claim 3 , wherein the U-shaped convolutional neural network model further comprises a parallel convolution unit between the encoding unit and the decoding unit, the parallel convolution unit is configured to perform multiple dilation convolution operations in parallel on a feature map outputted by a last layer of the convolutional layers in the encoding unit, and dilation ratios between the multiple dilation convolution operations performed in parallel are different. 8. The method according to claim 3 , wherein a convolutional layer in the decoding unit comprises a convolution operation and a recombination operation, the convolution operation is used for up-sampling a feature map, and the recombination operation is used for reconstructing the a number of rows, columns, and dimensions of a matrix for the up-sampled feature map. 9. An apparatus for correcting a distorted document image, comprising: a memory and a processor; wherein the memory is configured to store program instructions; and the processor is configured to call the program instructions stored in the memory to: obtain a distorted document image; and input the distorted document image into a correction model, and obtain a corrected image corresponding to the distorted document image; wherein the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted, wherein the correction model comprises a deformation parameter prediction module and a deformation correction module connected in series; wherein the deformation parameter prediction module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and the deformation correction module is a model obtained by training with the set of image samples and output results of the deformation parameter prediction module as inputs and the corrected image corresponding to each image sample in the set of image samples as an output; the processor is specifically configured to: input the distorted document image into the correction model, output an intermediate result through the deformation parameter prediction module, and obtain, according to the intermediate result, the corrected image corresponding to the distorted document image through the deformation correction module; the intermediate result comprising a deformation parameter of each pixel in the distorted document image; wherein the deformation parameter prediction module comprises at least two stages of deformation parameter prediction sub-modules connected in series; wherein a first-stage deformation parameter prediction sub-module is a U-shaped convolutional neural network model obtained by training with the set of image samples as inputs and a deformation parameter of each pixel of each image sample comprised in the set of image samples as an output, and another stage deformation parame

Assignees

Inventors

Classifications

  • Document · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06T5/80Primary

    Geometric correction · CPC title

  • Erosion or dilatation, e.g. thinning · CPC title

  • G06T5/60Primary

    using machine learning, e.g. neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11756170B2 cover?
Embodiments of the present disclosure provide a method and apparatus for correcting a distorted document image, where the method for correcting a distorted document image includes: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; where the correction model is a…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T5/80. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 12 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).