Keypoint-based multi-label word segmentation and localization
US-10878270-B1 · Dec 29, 2020 · US
US2022327816A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022327816-A1 |
| Application number | US-202217714322-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 6, 2022 |
| Priority date | Apr 9, 2021 |
| Publication date | Oct 13, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system trains a machine learning model which recognizes characters of text images. The system stores the machine learning model which recognizes characters of text images. The machine learning model includes a character segmentation network which is configured to extract visual features from text images, and to generate character bounding boxes from the text images, a domain adaptation network configured to classify the text images into domains based on the visual features, and a text recognition network configured to recognize characters in the text images based on the character bounding boxes and the visual features. The system is configured to (1) reverse gradients in the training of the domain adaptation network to minus gradients and back-propagate the minus gradients through the character segmentation network (2) back-propagate gradients in the training of the text recognition network through the character segmentation network.
Opening claim text (preview).
What is claimed is: 1 . A system for training a machine learning model which recognizes characters of text images, the system comprising: one or more processors; and one or more storage devices, wherein the one or more storage devices store the machine learning model which recognizes characters of text images, wherein the machine learning model which recognizes characters of text images includes: a character segmentation network which is configured to extract visual features from text images, and to generate character bounding boxes from the text images; a domain adaptation network configured to classify the text images into domains based on the visual features; and a text recognition network configured to recognize characters in the text images based on the character bounding boxes and the visual features, and wherein the one or more processors are configured to: reverse gradients in training of the domain adaptation network to minus gradients, and to back-propagate the minus gradients through the character segmentation network; and back-propagate gradients in training of the text recognition network through the character segmentation network. 2 . The system according to claim 1 , wherein the domain adaptation network is configured to classify the text images into domains based on the character bounding boxes and the visual features. 3 . The system according to claim 1 , wherein the domain adaptation network includes: a layer configured to extract feature maps corresponding to the character bounding boxes from the visual features; a concatenation layer configured to concatenate the extracted feature maps; and a block configured to discriminate the domains of the text images based on the concatenated feature maps. 4 . The system according to claim 1 , wherein the text recognition network is configured to align visual features to output sequences by the character bounding box. 5 . The system according to claim 1 , wherein the text recognition network includes: an RNN encoder configured to encode the visual features; an RNN decoder configured to output character sequences; and an alignment layer provided between the RNN encoder and the RNN decoder, wherein the alignment layer is configured to align encoded features obtained from the RNN encoder, to a character sequences by the character bounding boxes obtained by the character segmentation network, and wherein the RNN decoder is configured to output character sequences from the extracted encoded features. 6 . The system according to claim 1 , further comprising: an input apparatus; and a monitor, wherein the one or more processors is configured to: display, on the monitor, output from at least one of the character segmentation network, the domain adaptation network, or the text recognition network; and receive a revision of the output which has been input from the input apparatus. 7 . A method of training a machine learning model which recognizes characters of text images by a system, the system storing the machine learning model which recognizes characters of text images, the machine learning model which recognizes characters of text images including: a character segmentation network which is configured to extract visual features from text images, and to generate character bounding boxes from the text images; a domain adaptation network configured to classify the text images into domains based on the visual features; and a text recognition network configured to recognize characters in the text images based on the character bounding boxes and the visual features, the method comprising: reversing, by the system, gradients in the training of the domain adaptation network to minus gradients, and backpropagating the minus gradients through the character segmentation network; and back-propagating, by the system, gradients in the training of the text recognition network through the character segmentation network. 8 . The method according to claim 7 , further comprising of the domain adaptation network, classifying the text images into domains based on the character bounding boxes and the visual features.
Related publications grouped by family.
Answers are generated from the same data shown on this page.