Content aware font recommendation
US-11636251-B2 · Apr 25, 2023 · US
US12135940B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12135940-B2 |
| Application number | US-202117142566-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 6, 2021 |
| Priority date | Jan 6, 2020 |
| Publication date | Nov 5, 2024 |
| Grant date | Nov 5, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for keyword extraction, an apparatus, an electronic device, and a computer-readable storage medium, which relate to the field of artificial intelligence are provided. The method includes collecting feature information corresponding to an image to be processed, the feature information including text representation information and image visual information and then extracting keywords from the image to be processed based on the feature information. The text representation information includes text content and text visual information corresponding to each text line in the image to be processed. The method for keyword extraction, apparatus, electronic device, and computer-readable storage medium provided in the embodiments of the disclosure may extract the keywords from an image to be processed.
Opening claim text (preview).
What is claimed is: 1. A method for keyword extraction, the method comprising: collecting feature information corresponding to an image to be processed, the feature information including text representation information and image visual information, the text representation information including text content and text visual information corresponding to a text line in the image to be processed, the text visual information comprising a text feature map corresponding to the text line, the text feature map including text line visual features obtained by encoding visual information including a font size, a font color, and a font shape of the text line; and based on the image visual information and the text representation information including the text content and the text visual information including the text feature map including text line visual features obtained by encoding the visual information including the font size, the font color, and the font shape of the text line, extracting keywords from the image to be processed, the keywords extracted from the image to be processed being representative words or phrases summarizing the image to be processed. 2. The method of claim 1 , wherein, for the text line, the text visual information corresponding to the text line further comprises at least one of: a text position in the image to be processed; word positions in the text line in the image to be processed; or word relative positions in the text line. 3. The method of claim 2 , wherein the extracting of the keywords from the image to be processed comprises: encoding the feature information to obtain an encoded result of the feature information; and based on the encoded result, extracting keywords from the image to be processed, wherein the encoded result includes a text context representation, an image feature representation, and at least one of structure information or topic information representations of all text lines, wherein the text context representation is obtained based on the text representation information, wherein the image feature representation is obtained based on the image visual information, and wherein the structure information and topic information representations of all text lines are obtained based on the text context representation. 4. The method of claim 3 , further comprising: decoding the text context representation, the image feature representation, and the at least one of structure information or topic information representations of all text lines; and based on the decoding, obtaining a keyword sequence comprising the keywords extracted from the image to be processed. 5. The method of claim 3 , wherein the extracting of the keywords from the image to be processed further comprises: based on the encoded result, determining target prediction modes corresponding to each decoding time operation, respectively, and determining a prediction word corresponding to the target prediction modes; outputting prediction words corresponding to each decoding time operation, respectively; and based on a prediction word sequence of all decoding time operations, obtaining keywords. 6. The method of claim 5 , wherein, for a decoding time operation, determining a target prediction mode corresponding to the decoding time operation and determining a prediction word corresponding to the target prediction mode comprises at least one of: based on the encoded result, determining prediction words of each pre-configured prediction mode corresponding to the decoding time operation, respectively, and determining the target prediction mode corresponding to the decoding time operation, and based on the prediction words of each pre-configured prediction mode and the target prediction mode corresponding to the decoding time operation, obtaining a prediction word corresponding to a target pre-stored mode; or based on the encoded result, determining the target prediction mode corresponding to the decoding time operation from each pre-configured prediction mode, and obtaining the prediction word corresponding to the target prediction mode. 7. The method of claim 6 , wherein a pre-configured prediction mode comprises: a first prediction mode in which a keyword prediction is performed based on a common word dictionary; and a second prediction mode in which the keyword prediction is performed based on all words in input text lines. 8. The method of claim 7 , wherein the determining of the prediction word corresponding to the target prediction mode comprises: in response to the target prediction mode being the second prediction mode, determining a weight corresponding to each word contained in the text content in the image to be processed based on the encoded result; and based on the weight corresponding to each word, determining the prediction word corresponding to the target prediction mode. 9. The method of claim 8 , wherein the determining of the weight corresponding to each word contained in the text content in the image to be processed based on the encoded result comprises: based on the encoded result, obtaining a hidden vector corresponding to a current decoding time operation through feature fusion processing; and based on the text context representation and the hidden vector, determining the weight corresponding to each word contained in the text content in the image to be processed. 10. The method of claim 3 , wherein the encoding of the feature information to obtain the encoded result corresponding to the feature information comprises at least one of: encoding the text representation information to obtain a text line representation; encoding the text line representation to obtain the text context representation; or encoding the text context representation to obtain a representation of the structure information and the topic information representations of all text lines. 11. The method of claim 10 , wherein the text content includes a word sequence corresponding to the text line, wherein, for one text line, the text content of the one text line includes a word sequence corresponding to the one text line, and wherein, for the one text line, encoding the text content to obtain one text line representation comprises: encoding the word sequence corresponding to the one text line to obtain a character-based word representation; and based on the character-based word representation, determining the one text line representation corresponding to the one text line. 12. The method of claim 10 , wherein the text line representation comprises a text line representation corresponding to at least one text line, and wherein the encoding of the text line representation to obtain the text context representation comprises: encoding the text line representation respectively to obtain a local text context representation corresponding to the text line; encoding all text line representations as a whole to obtain a global text context representation corresponding to all text lines; and based on the local text context representation corresponding to the text line and the global text context representation corresponding to all text lines, determining the text context representation corresponding to the text line. 13. A non-transitory computer-readable storage medium having stored thereon computer programs which, when are executed by a processor, perform the method of claim 1 . 14. The method of claim 1 , wherein the visual information used to obtain the text line visual features further includes a text background color, structure information, and border information, and wherein the text line visual features are furt
using neural networks · CPC title
Classification techniques · CPC title
Document-oriented image-based pattern recognition · CPC title
Character recognition · CPC title
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.